Practice and reinforce the concepts from Lesson 17
Time estimate: 45-60 minutes
1C_aSaScP5bnuc-YKRpIgRHGklJ7iI0Il⚠️ Important Always create a copy of the Colab notebook before starting. This ensures:
- Your work is saved to your Google Drive
- You don't accidentally modify the template
- You can return to your work later
pdnppd.read_csv() to load the provided datasetdf💡 If you get a "file not found" error, check that the CSV file path is correct in your Colab environment.
Step 3: Explore Your Data (10 minutes)
- Use
df.head()to view the first 5 rows- Use
df.columnsto list all column names- Use
df.info()to see data types and missing values- Use
df.shapeto check the dataset dimensions- Take notes on what needs cleaning
Step 4: Remove Unnecessary Columns (5 minutes)
- Identify columns that won't be used in analysis
- Use
df.drop()with the column names- Set
axis=1to indicate you're dropping columns- Verify columns are removed with
df.columnsStep 5: Handle Duplicate Values (5 minutes)
- Check for duplicates using
df.duplicated().sum()- View duplicate rows with
df[df.duplicated()]- Remove duplicates using
df.drop_duplicates()- Verify removal by checking the shape again tip Common Challenge Sometimes you may want to keep duplicates based on certain columns only. Use the
subsetparameter indrop_duplicates()to specify which columns to check.
df.isnull().sum()df.dropna()df.fillna(df.mean())df.fillna(df.mode())df.dtypespd.to_numeric()pd.to_datetime().astype('category')df.dtypes💡 Tip Use
errors='coerce'in conversion functions to handle invalid values gracefully.
df.to_csv('cleaned_data.csv', index=False)Problem: "No module named pandas" error
!pip install pandas in a cell firstProblem: Data types not converting properly
errors='coerce' to handle problematic valuesProblem: Memory errors with large datasets
chunksize parameter when reading CSVdf.sample()Problem: Cleaned data not saving
ℹ️ Helpful Resources
⚠️ Before You Submit ✅ Ensure all code cells have been run successfully ✅ Your cleaned dataset is exported and downloadable ✅ You've made a copy of the Colab notebook ✅ All steps are completed with comments explaining your approach
ℹ️ Submission Checklist
- Colab notebook link
- Cleaned CSV file
- Brief summary of cleaning steps taken
- Any challenges faced and how you solved them