Pandas Basics - Discovery Challenge

Estimated Time: 60 minutes Course: AI-1 Data Analysis and Data Science Activity: 15

Master the essential data manipulation library! This activity introduces Pandas DataFrames and Series - the backbone of data analysis workflows.

🎯 Learning Objectives

By the end of this activity, you will be able to:

Create and manipulate Pandas DataFrames (2D tabular data structures)
Work with Pandas Series (1D labeled arrays)
Perform data selection and filtering using boolean indexing
Apply DataFrame operations (sorting, grouping, aggregation)
Handle missing data (NaN values, dropna, fillna)

🚀 Getting Started (See Results in 30 Seconds!)

Choose your development environment based on your needs:

Option One: VS Code + Google Colab (Recommended for Developers)

Best for: Local development with cloud runtime consistency

Setup (first time only):

Open in VS Code:
bash
```
code activity-15-pandas-basics.ipynb
```
Select Google Colab Kernel:
- Click "Select Kernel" (top-right) OR
- Press Cmd+Shift+P -> Type "Select Kernel" -> Choose "Google Colab"
Authenticate:
- Browser opens for Google login
- Grant permissions
- Return to VS Code (kernel connects automatically)
Run First Cell:
- Click the play button on the first code cell
- Imports execute on Colab's cloud runtime
- Output appears directly in VS Code

Advantages:

✅ Runtime consistency with student environment (same Pandas version)
✅ Familiar IDE with extensions and shortcuts
✅ Git integration for version control
✅ Offline editing (execution requires internet)

First Connection: ~30-60 seconds Subsequent Connections: ~10-15 seconds

Option 2: Web Google Colab (Fallback for Students)

Best for: Quick access from any device, no setup required

Setup:

Open in Browser:
- Go to: https://colab.research.google.com
- Click "File" -> "Upload notebook"
- Select activity-15-pandas-basics.ipynb
Run Cells:
- Click "Runtime" -> "Run all" to execute all cells
- Or run cells individually: Shift+Enter

Advantages:

✅ No installation needed (works on Chromebooks, tablets)
✅ Shareable via link
✅ Auto-save to Google Drive

Note: Web Colab has the same free tier limits as VS Code integration

Option 3: Local Jupyter (Quick Edits, No Cloud)

Best for: Offline work, quick edits, familiar environment

Setup:

bash

# Install dependencies (one-time)
pip install -r requirements.txt

# Start Jupyter Notebook
jupyter notebook activity-15-pandas-basics.ipynb

# Or use Jupyter Lab (modern interface)
jupyter lab activity-15-pandas-basics.ipynb

Advantages:

✅ Offline development
✅ Fastest iteration for quick changes

Limitations:

⚠️ Library version differences vs Colab (may see different output)
⚠️ Different environment than students will use

Use When: Quick edits, no internet, testing local changes

⚙️ Runtime Requirements

Execution Time

Colab (cloud): ~5 minutes
Local Jupyter: ~5 minutes

Memory Usage

Peak RAM: ~500MB (with sample datasets)
Colab Free Tier: 12GB available ✅ Sufficient

Note: Large DataFrames (100k+ rows) may increase memory usage. Use chunking for ``datasets >1``GB.

GPU Requirement

This Activity: ❌ Not needed (Pandas operations are CPU-based)

Colab Tier Compatibility

Free Tier: ✅ Sufficient for this activity
Pro Tier: Not necessary

💡 Google Colab Free Tier Limits

Before starting, understand these session limits:

Limit	Value	Impact
Session Duration	12 hours max	Restart every 12 hours
Idle Timeout	90 minutes	Disconnects if inactive
RAM	12GB	Sufficient for this activity

💡 Tips:

Save frequently: Press Cmd+S (Mac) or Ctrl+S (Windows)
Avoid long breaks: Colab disconnects after 90 min idle
Re-run from top: After disconnection, run cells from beginning to restore state
Large DataFrames: Save to CSV periodically (df.to_csv('checkpoint.csv', index=False))

🎯 What's Already Working

65% of the code is implemented for you:

✅ Imports configured (numpy, pandas, matplotlib ready to use)
✅ Sample datasets provided (no need to find data sources)
✅ DataFrame creation examples (see working code patterns)
✅ Helper functions (data loading, validation utilities)
⚠️ DataFrame operations (5 TODOs for you to complete)
⚠️ Data filtering exercises (3 TODOs for you to complete)
⚠️ Aggregation challenges (3 TODOs for you to complete)

🔍 Explore the Working Code First

Before jumping into TODOs:

Run all cells to see what's already implemented
Examine the output of completed examples
Read the DataFrame inspection patterns (df.head(), df.info(), df.describe())
Review the sample data to understand structure before manipulating it

📋 Tasks to Complete

TODO 1: Create DataFrames from Different Sources (Easy)

Estimated Time: 10 minutes

What You'll Build: Create Pandas DataFrames from dictionaries, lists, and NumPy arrays.

Success Criteria:

✅ Create DataFrame from Python dictionary (column-oriented)
✅ Create DataFrame from list of lists (row-oriented)
✅ Create DataFrame from NumPy array with custom column names
✅ Display DataFrame structure using .info() and .describe()

Hints:

💡 Dictionary to DataFrame: pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
💡 List to DataFrame: pd.DataFrame(data, columns=['A', 'B', 'C'])
💡 NumPy array: pd.DataFrame(np.array([[1,2], [3,4]]), columns=['X', 'Y'])
💡 Inspect structure: df.info() shows types, df.describe() shows statistics
💡 Set index: df.index = ['row1', 'row2'] for custom row labels

Expected Output:

sql

DataFrame from dictionary:
   Name  Age      City
0  Alice   25  New York
1    Bob   30    London
2  Carol   28     Paris

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    3 non-null      object
 1   Age     3 non-null      int64
 2   City    3 non-null      object

Key Concepts:

DataFrame: 2D labeled data structure (like a spreadsheet or SQL table)
Series: Single column of a DataFrame (1D labeled array)
Index: Row labels (default: 0, 1, 2... or custom labels)
Columns: Named columns (required for structured data)

Code Location: Cell 5 (search for # TODO 1:)

TODO 2: DataFrame Selection and Indexing (Easy)

Estimated Time: 10 minutes

What You'll Build: Master different ways to access DataFrame data: columns, rows, and specific cells.

Success Criteria:

✅ Select single column (returns Series)
✅ Select multiple columns (returns DataFrame)
✅ Select rows by index using .loc[] (label-based)
✅ Select rows by position using .iloc[] (integer-based)

Hints:

💡 Single column: df['column_name'] or df.column_name
💡 Multiple columns: df[['col1', 'col2']] (note double brackets)
💡 Row by label: df.loc['row_label'] or df.loc[0:2] (inclusive)
💡 Row by position: df.iloc[0] or df.iloc[0:3] (exclusive)
💡 Specific cell: df.loc['row', 'col'] or df.iloc[0, 1]

Expected Output:

yaml

Single column (Series):
0    25
1    30
2    28
Name: Age, dtype: int64

Multiple columns (DataFrame):
    Name  Age
0  Alice   25
1    Bob   30
2  Carol   28

Row by label (loc):
Name       Bob
Age         30
City    London
Name: 1, dtype: object

Row by position (iloc):
Name       Bob
Age         30
City    London
Name: 1, dtype: object

Key Concepts:

.loc[]: Label-based indexing (uses row/column names)
.iloc[]: Position-based indexing (uses integer positions)
Series vs DataFrame: Single bracket returns Series, double brackets return DataFrame
Slicing: loc is inclusive, iloc is exclusive (like Python lists)

Code Location: Cell 7 (search for # TODO 2:)

TODO 3: Filtering Data with Boolean Indexing (Medium)

Estimated Time: 15 minutes

What You'll Build: Filter DataFrames using conditional expressions (essential for data analysis).

Success Criteria:

✅ Filter rows based on single condition (e.g., ``Age > 25``)
✅ Combine multiple conditions with & (AND) and | (OR)
✅ Use .isin() to filter by list of values
✅ Filter for missing values using .isna() / .notna()

Hints:

💡 Single condition: df[df['Age'] > 25]
💡 Multiple conditions: df[(df['Age'] > 25) & (df['City'] == 'Paris')] (note parentheses!)
💡 OR condition: df[(df['Age'] > 30) | (df['City'] == 'London')]
💡 Filter by list: df[df['City'].isin(['Paris', 'London'])]
💡 Non-null values: df[df['Age'].notna()]

Expected Output:

java

Age greater than 25:
   Name  Age    City
1   Bob   30  London
2 Carol   28   Paris

Multiple conditions (Age > 25 AND City = 'Paris'):
   Name  Age   City
2 Carol   28  Paris

Filter by list (City in ['Paris', 'London']):
   Name  Age    City
1   Bob   30  London
2 Carol   28   Paris

Key Concepts:

Boolean Indexing: Use True/False array to filter rows
Comparison Operators: >, <, ==, !=, >=, <=
Logical Operators: & (AND), | (OR), ~ (NOT) - use parentheses!
.isin(): Check if values are in a list
.isna() / .notna(): Detect missing values (NaN)

Code Location: Cell 9 (search for # TODO 3:)

TODO 4: DataFrame Operations and Transformations (Medium)

Estimated Time: 10 minutes

What You'll Build: Add, modify, and delete columns; sort data; handle duplicates.

Success Criteria:

✅ Add new column (calculated from existing columns)
✅ Modify existing column values
✅ Delete column using .drop()
✅ Sort DataFrame by single or multiple columns

Hints:

💡 Add column: df['new_col'] = df['col1'] + df['col2']
💡 Modify column: df['Age'] = df['Age'] + 1 (increment all ages)
💡 Delete column: df.drop('col_name', axis=1, inplace=True) (axis=1 for columns)
💡 Sort ascending: df.sort_values('Age')
💡 Sort descending: df.sort_values('Age', ascending=False)
💡 Sort by multiple: df.sort_values(['City', 'Age'])

Expected Output:

vbnet

Added 'Age_Plus_5' column:
    Name  Age      City  Age_Plus_5
0  Alice   25  New York          30
1    Bob   30    London          35
2  Carol   28     Paris          33

Sorted by Age (ascending):
    Name  Age      City
0  Alice   25  New York
2  Carol   28     Paris
1    Bob   30    London

Sorted by City then Age:
    Name  Age      City
1    Bob   30    London
0  Alice   25  New York
2  Carol   28     Paris

Key Concepts:

Column Assignment: Create new columns with df['name'] = values
.drop(): Remove columns (axis=1) or rows (axis=0)
inplace=True: Modify DataFrame directly (no return value)
.sort_values(): Sort by column(s), default ascending
Chaining: Operations can be chained: df.sort_values('Age').head(5)

Code Location: Cell 11 (search for # TODO 4:)

TODO 5: Handling Missing Data (Medium)

Estimated Time: 10 minutes

What You'll Build: Detect and handle missing values (NaN) using Pandas methods.

Success Criteria:

✅ Detect missing values using .isna() and .sum()
✅ Drop rows with missing values using .dropna()
✅ Fill missing values using .fillna()
✅ Replace specific values using .replace()

Hints:

💡 Count missing: df.isna().sum() (per column)
💡 Drop rows with ANY NaN: df.dropna()
💡 Drop rows with ALL NaN: df.dropna(how='all')
💡 Drop columns with NaN: df.dropna(axis=1)
💡 Fill with value: df.fillna(0) or df.fillna(df.mean())
💡 Forward fill: df.fillna(method='ffill') (use previous value)

Expected Output:

sql

Missing values per column:
Name     0
Age      1
City     2
Salary   1
dtype: int64

After dropping rows with NaN:
    Name  Age      City  Salary
0  Alice   25  New York   50000

After filling NaN with 0:
    Name   Age      City  Salary
0  Alice  25.0  New York   50000
1    Bob  30.0         0   60000
2  Carol   0.0     Paris       0

Key Concepts:

NaN (Not a Number): Pandas representation of missing data
.isna() / .isnull(): Detect missing values (both equivalent)
.dropna(): Remove rows/columns with missing values
.fillna(): Replace missing values with specified value or method
Forward/Backward Fill: Propagate non-null values forward/backward

Code Location: Cell 13 (search for # TODO 5:)

TODO 6: Aggregation and GroupBy (Hard)

Estimated Time: 15 minutes

What You'll Build: Perform statistical aggregations and group-based calculations.

Success Criteria:

✅ Calculate summary statistics (mean, median, sum, count)
✅ Group data by category using .groupby()
✅ Apply multiple aggregation functions to groups
✅ Use .agg() for custom aggregations

Hints:

💡 Basic stats: df['Age'].mean(), df['Salary'].sum(), df['City'].value_counts()
💡 Group by: df.groupby('City')['Salary'].mean()
💡 Multiple aggregations: df.groupby('City').agg({'Salary': ['mean', 'sum'], 'Age': 'max'})
💡 Group multiple columns: df.groupby(['City', 'Department']).size()
💡 Reset index after groupby: .reset_index()

Expected Output:

vbnet

Average salary by city:
City
London      60000
New York    50000
Paris       55000
Name: Salary, dtype: int64

Multiple aggregations:
        Salary           Age
          mean    sum   max
City
London   60000  60000    30
New York 50000  50000    25
Paris    55000 110000    28

Group size (count per city):
City
London      1
New York    1
Paris       2
dtype: int64

Key Concepts:

.groupby(): Split data into groups based on column values
Aggregation Functions: mean, sum, count, max, min, median, std
.agg(): Apply multiple aggregation functions at once
.value_counts(): Count occurrences of unique values
Split-Apply-Combine: GroupBy workflow (split by key, apply function, combine results)

Code Location: Cell 15 (search for # TODO 6:)

TODO 7: Merging and Joining DataFrames (Hard)

Estimated Time: 15 minutes

What You'll Build: Combine multiple DataFrames using merge, join, and concat operations.

Success Criteria:

✅ Concatenate DataFrames vertically (stacking rows)
✅ Merge DataFrames using .merge() (SQL-like joins)
✅ Perform left, right, inner, and outer joins
✅ Join on index using .join()

Hints:

💡 Concatenate rows: pd.concat([df1, df2], axis=0) (vertical stack)
💡 Concatenate columns: pd.concat([df1, df2], axis=1) (horizontal stack)
💡 Inner join (intersection): pd.merge(df1, df2, on='key', how='inner')
💡 Left join (all from left): pd.merge(df1, df2, on='key', how='left')
💡 Outer join (union): pd.merge(df1, df2, on='key', how='outer')
💡 Join on index: df1.join(df2, how='left')

Expected Output:

sql

Concatenated DataFrames (vertical):
    Name  Age
0  Alice   25
1    Bob   30
0  Carol   28
1  David   35

Inner merge (matching keys only):
    Name  Age      City  Department
0  Alice   25  New York          HR
1    Bob   30    London          IT

Left merge (all from left, NaN for missing):
    Name  Age      City Department
0  Alice   25  New York         HR
1    Bob   30    London         IT
2  Carol   28     Paris        NaN

Key Concepts:

pd.concat(): Stack DataFrames vertically (axis=0) or horizontally (axis=1)
.merge(): SQL-style joins using common column(s)
Join Types: inner (intersection), left (all left), right (all right), outer (union)
.join(): Shortcut for merging on index
Key Columns: Must have matching values to join DataFrames

Code Location: Cell 17 (search for # TODO 7:)

TODO 8: Advanced Filtering with `.query()` (Hard)

Estimated Time: 10 minutes

What You'll Build: Use the .query() method for cleaner, SQL-like filtering syntax.

Success Criteria:

✅ Filter using string expressions with .query()
✅ Combine multiple conditions in query string
✅ Reference variables in query using @ symbol
✅ Compare performance vs boolean indexing (for learning)

Hints:

💡 Basic query: df.query('Age > 25')
💡 Multiple conditions: df.query('Age > 25 and City == "Paris"')
💡 Use variables: threshold = 30; df.query('Age > @threshold')
💡 String conditions: df.query('City in ["Paris", "London"]')
💡 Column names with spaces: df.query('Column Name > 10') (use backticks)

Expected Output:

ini

Query: Age > 25
   Name  Age    City
1   Bob   30  London
2 Carol   28   Paris

Query: Age > 25 and City == 'Paris'
   Name  Age   City
2 Carol   28  Paris

Query with variable (threshold=27):
   Name  Age    City
1   Bob   30  London
2 Carol   28   Paris

Key Concepts:

.query(): SQL-like filtering with string expressions
Cleaner Syntax: Easier to read than boolean indexing for complex conditions
Variable Reference: Use @variable_name to reference external variables
Performance: .query() can be faster for large DataFrames
String Columns: Use single quotes inside double quotes (or vice versa)

Code Location: Cell 19 (search for # TODO 8:)

🧪 Testing Your Work

Manual Testing Checklist

Run All Cells:

All cells execute without errors
Outputs match expected results (check against examples)
No warnings or deprecation messages
DataFrames display correctly

Data Validation:

DataFrames have correct shape (rows, columns)
Data types are appropriate (int64, float64, object)
No unexpected NaN values (unless intentional)
Column names are correct

DataFrame Inspection:

Use df.shape to verify dimensions
Use df.dtypes to check column types
Use df.head() to preview data
Use df.info() to check for missing values

🔧 Troubleshooting

Common Issues & Solutions

Issue: "ModuleNotFoundError: No module named 'pandas'" Solution:

python

# In Colab, pandas is pre-installed
# If using local Jupyter, install dependencies:
pip install -r requirements.txt

# Verify installation:
import pandas as pd
print(pd.__version__)  # Should print version number

# Then restart kernel: Kernel → Restart

Issue: "KeyError: 'column_name'" Solution:

Column name doesn't exist in DataFrame
Check available columns: print(df.columns.tolist())
Common mistakes:
- Typo in column name (case-sensitive)
- Extra spaces: 'Age ' vs 'Age'
- Wrong DataFrame: df1['col'] when you meant df2['col']
Fix: Use exact column name from df.columns

Issue: "SettingWithCopyWarning: A value is trying to be set on a copy of a slice" Solution:

Modifying a view (slice) of DataFrame instead of original
Cause: df_subset = df[df['Age'] > 25]; df_subset['New'] = 0 (modifying slice)
Fix Option 1 (Recommended): Use .loc[] for assignment
python
```
df.loc[df['Age'] > 25, 'New'] = 0
```

Fix Option 2: Create explicit copy

python

df_subset = df[df['Age'] > 25].copy()
df_subset['New'] = 0

Why it matters: Prevents unintended changes to original DataFrame

Issue: "TypeError: 'Series' object is not callable" Solution:

Using parentheses instead of brackets
Wrong: df.Age() or df['Age']()
Correct: df['Age'] or df.Age

Remember: Brackets for selection, parentheses for method calls

python

df['Age']        # ✅ Select column
df.head()        # ✅ Call method
df['Age']()      # ❌ Error: Series is not callable

Issue: Memory error with large DataFrames Solution:

Colab free tier limited to 12GB RAM
Check memory usage: df.memory_usage(deep=True).sum() / 1024**2 (MB)

Reduce memory:

python

# Convert to smaller data types
df['Age'] = df['Age'].astype('int8')  # If values fit in int8 (0-255)
df['Category'] = df['Category'].astype('category')  # For repeated strings

# Load data in chunks
for chunk in pd.read_csv('large_file.csv', chunksize=10000):
    process(chunk)

# Use specific columns only
df = pd.read_csv('file.csv', usecols=['Col1', 'Col2'])

Issue: "ValueError: The truth value of a Series is ambiguous" Solution:

Using Python's and/or instead of Pandas' &/|
Wrong: df[(df['Age'] > 25) and (df['City'] == 'Paris')]
Correct: df[(df['Age'] > 25) & (df['City'] == 'Paris')]
Remember:
- Pandas: Use & (AND), | (OR), ~ (NOT)
- Python: Use and, or, not (for single values only)
- Always use parentheses around each condition!

Issue: Notebook works in Colab but fails locally Solution:

Library version mismatch (Colab vs local)

Check versions:

python

import pandas as pd, numpy as np
print(f"Pandas: {pd.__version__}, NumPy: {np.__version__}")

Colab versions (as of Nov 2025): Pandas 2.1.4, NumPy 1.26.4

Install matching versions (if needed):

bash

pip install pandas==2.1.4 numpy==1.26.4

Debugging Tips

Inspect DataFrames at Every Step:

python

print(f"Shape: {df.shape}")        # (rows, columns)
print(f"Columns: {df.columns}")    # Column names
print(df.head())                   # First 5 rows
print(df.info())                   # Types and missing values
print(df.describe())               # Statistics for numeric columns

Check Data Types:

python

print(df.dtypes)  # Column data types
# If Age is 'object' instead of 'int64', conversion needed:
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')

Find Missing Values:

python

print(df.isna().sum())             # Count NaN per column
print(df[df['Age'].isna()])        # Show rows with NaN in 'Age'

Test Filters Incrementally:

python

# Test one condition at a time
mask1 = df['Age'] > 25
print(f"Mask1 true count: {mask1.sum()}")

mask2 = df['City'] == 'Paris'
print(f"Mask2 true count: {mask2.sum()}")

combined = mask1 & mask2
print(f"Combined true count: {combined.sum()}")
print(df[combined])

Use .loc[] for Safe Assignment:

python

# ❌ May cause SettingWithCopyWarning:
df_subset = df[df['Age'] > 25]
df_subset['New'] = 0

# ✅ Safe approach:
df.loc[df['Age'] > 25, 'New'] = 0

🚀 Extension Challenges

Completed the main activity? Try these bonus challenges:

Beginner Extensions

Challenge 1: Load a CSV from a URL and perform basic analysis (mean, median, mode) [+15 min]
Challenge 2: Create a pivot table to summarize data by multiple categories [+20 min]
Challenge 3: Detect and remove duplicate rows using .duplicated() and .drop_duplicates() [+15 min]

Intermediate Extensions

Challenge 4: Merge 3+ DataFrames using a common key column [+30 min]
Challenge 5: Create calculated columns using .apply() with custom functions [+30 min]
Challenge 6: Reshape data using .melt() (wide to long) and .pivot() (long to wide) [+45 min]

Advanced Extensions

Challenge 7: Optimize DataFrame memory usage by converting to appropriate dtypes [+1 hour]
Challenge 8: Implement a data cleaning pipeline: load CSV, handle missing values, remove outliers, normalize columns [+2 hours]

🏆 Success Criteria

Your activity is complete when:

✅ All 8 TODO exercises completed
✅ All cells execute without errors
✅ Outputs match expected results
✅ DataFrames display correctly with proper structure
✅ Code demonstrates understanding of Pandas concepts
✅ Extension challenges attempted (optional but encouraged)

📤 Submission

Complete all TODOs in the notebook
Verify all cells run without errors (Runtime -> Run all, or Kernel -> Restart & Run All)
Save your notebook:
- VS Code: Cmd+S / Ctrl+S
- Web Colab: File -> Save
- Local Jupyter: File -> Save and Checkpoint
Download completed notebook (if using web Colab):
- File -> Download -> Download .ipynb
Submit via course portal (follow course-specific instructions)

📚 Additional Resources

Original Google Colab:
https://colab.research.google.com/drive/1_6a628mpxKjVz2KeF5ET8rkjGiPwfMtf?usp=sharing
1_6a628mpxKjVz2KeF5ET8rkjGiPwfMtf
Open in Colab

Previous: Activity 14 - NumPy Arrays (prerequisite)
Next: Activity 16 - Data Visualization (builds on Pandas)

🎉 Congratulations!

Once you complete this activity, you'll have:

✅ Mastered Pandas DataFrames and Series
✅ Learned essential data manipulation techniques
✅ Practiced filtering, sorting, and aggregation
✅ Handled missing data like a pro
✅ Built a foundation for real-world data analysis

This activity unlocks your ability to work with real datasets! Pandas is the industry standard for data manipulation in Python - you'll use these skills in every data analysis project.

💬 Need Help?

Stuck on a TODO?

Re-read the hints section
Check the expected output format
Review the Key Concepts explanations
Inspect your DataFrame: print(df.head()), print(df.info())

Still stuck after 15 minutes?

Post in the course discussion forum
Ask during office hours
Email instructor with:
- Specific TODO number
- Error message screenshot
- What you've tried so far

Pro Tips:

💡 Always use .head() to preview DataFrames before operations
💡 Check data types with .dtypes (many errors come from wrong types)
💡 Use .loc[] for safe assignment (avoids SettingWithCopyWarning)
💡 Pandas errors are verbose but helpful - read the full message!

Remember: Learning Pandas is a journey, not a race. It's OK to:

❌ Get KeyErrors (they teach you to check column names)
❌ Need multiple attempts (DataFrame operations are tricky at first)
❌ Ask for help (data wrangling is collaborative!)

Happy data wrangling! 🚀📊✨

This activity follows the 65-70% implementation methodology: core DataFrame operations provided, students complete targeted manipulation exercises

Template 15: Pandas Basics

📦 Project Files Included:

https://colab.research.google.com/drive/1_6a628mpxKjVz2KeF5ET8rkjGiPwfMtf?usp=sharing