:information_source: Project Overview
:bar_chart: Difficulty Level: Intermediate to Advanced
⏱️ Estimated Time: 3 weeks (L21-L23)
:dart: Skills Practiced:
Data cleaning and preprocessing
Exploratory data analysis (EDA)
Data visualization with Python
Statistical analysis and interpretation
Professional report writing
Technical presentation skills
sql
Week 21 : Dataset Selection & Cleaning
├── Choose dataset aligned with theme
├── Set up Google Colab environment
├── Implement data cleaning procedures
└── Begin exploratory analysis
Week 22 : Analysis & Report Writing
├── Perform in - depth analysis
├── Create visualizations
├── Write comprehensive report
└── Prepare supporting evidence
Week 23 : Presentation & Submission
├── Create presentation slides
├── Practice delivery
├── Present findings (live/ recorded)
└── Submit all deliverables
This guide outlines the requirements, timeline, and evaluation criteria for the final data analysis project. The project involves selecting a dataset, performing data analysis, and presenting findings through code, documentation, and presentation.
Week
Focus
L21
Dataset Selection, Data Cleaning, Initial Analysis
L22
Report Development, Presentation Preparation
L23
Project Presentation
This project consists of three essential components:
Platform : Google Colab Notebook
Requirements :
Data cleaning operations
Data analysis techniques
Data manipulation
Data visualization
:warning: Critical Setup
Make sure to save your Google Colab notebook regularly! Use File -> Save a copy in Drive to prevent losing your work. Enable automatic saving by going to Tools -> Settings -> Auto-save.
Format : Google Docs (or equivalent document format)
Requirements :
Cover page
Overview/Introduction
Dataset background explanation
Clear project objectives
Results section (minimum 2 distinct results)
Each result must include at least 2 figures:
At least 1 graph/chart
At least 1 table
Observations about each result
Reasoning and interpretation
Supporting evidence from credible sources
Conclusion
Summary of key findings
Recommended actions based on results
Potential improvements and future work
References
Duration : 5-10 minutes
Format Options :
Live presentation
Video recording (uploaded to Google Drive)
Requirements :
Professional slides that include:
Title/Introduction
Project overview
Results with visualizations
Conclusion and recommendations
Clear explanation of:
Project background and objectives
Analysis results and significance
Conclusions and potential applications
Dataset Selection
Choose a CSV dataset relevant to your selected theme
Ensure sufficient complexity and data points
:bulb: Dataset Sources
Popular platforms for finding datasets:
Kaggle : Wide variety of real-world datasets
Data.gov : Government open data
UCI Machine Learning Repository : Academic datasets
Google Dataset Search : Search across multiple sources
Data Cleaning
Check for and handle duplicate entries
Identify and address missing values
Remove or transform irrelevant columns
Standardize data formats and values
Data Manipulation
Apply appropriate techniques:
Sorting and filtering
Grouping related data
Joining datasets when applicable
Transforming variables as needed
Data Visualization
Create meaningful visualizations with:
Clear axes (x-axis, y-axis)
Descriptive title
Appropriate labels
Legend when using multiple data series
Consistent formatting and style
Your visualizations should effectively communicate your findings. Key components include:
Title : Descriptive and specific
Axes : Properly labeled with units
Legend : When multiple data series are displayed
Data Labels : When appropriate for clarity
Color Scheme : Logical and accessible
Select one of these broader themes and develop a specific research question:
AI chatbots and their adoption in 2023
Communication app usage patterns
IoT device popularity and market trends
Highest-rated games (2020-2023)
Online gaming participation metrics
E-sports growth and demographics
Choose from the 17 UN SDGs for analysis
Focus on measurable metrics and trends
Social media's impact on mental health
Technology in education
COVID-19 economic impact analysis
STEM education performance factors
:memo: Theme Selection Tip
Choose a theme that genuinely interests you! Your enthusiasm will show in your analysis and presentation. Consider topics related to your hobbies, career interests, or current events.
Projects will be evaluated on a scale of 1-5 for each of these criteria:
Implementation of required techniques
Code organization and clarity
Independent problem-solving ability
Technical sophistication
Clarity of objectives and methodology
Depth of analysis and insights
Quality of data presentation
Logical flow and organization
Supporting evidence and references
Alignment between objectives and results
Relevance and practical significance
Depth of insights derived from data
Quality of supporting evidence
Slide design and organization
Clarity of visual elements
Appropriate level of detail
Professional formatting
Clear communication of key points
Logical flow of presentation
Demonstrated understanding of material
Ability to explain technical concepts clearly
Time management
:warning: Grading Note
Each criterion is equally important! Don't focus solely on code at the expense of your report or presentation. A balanced approach leads to the best final grade.
Begin by exploring the dataset structure before cleaning
Document each step of your data cleaning process
Create reusable functions for common operations
Include comments explaining your reasoning
Start with a clear statement of your research question
Provide context for why this analysis matters
Be specific about methodology and limitations
Connect your findings to practical applications
Focus on key insights rather than technical details
Practice your presentation to ensure timing
Prepare for potential questions
Highlight the most interesting or surprising findings
"FileNotFoundError" when loading CSV
Solution: Upload file to Colab first using the files panel
Alternative: Use direct URL if dataset is online
Missing values causing errors
Solution: Use df.fillna()
or df.dropna()
appropriately
Check with df.isnull().sum()
first
Data type mismatches
Solution: Convert columns using pd.to_numeric()
or pd.to_datetime()
Always check dtypes with df.dtypes
Overlapping labels
Solution: Rotate labels with plt.xticks(rotation=45)
Or use plt.tight_layout()
Missing legend or labels
Solution: Always add plt.xlabel()
, plt.ylabel()
, and plt.title()
Use plt.legend()
for multiple series
:bulb: Pro Debugging Tip
When stuck, use print()
statements liberally to understand your data at each step. Check shapes with df.shape
and preview with df.head()
.
Final submission should include:
Google Colab notebook(s) with all code
Complete report document
Video recording
Any supplementary materials referenced
Please double-check your project before submitting it. Ensure you have included the Python code file, Report, and video explanation as required.
:warning: Pre-Submission Checklist
:link: Submit your project here
:information_source: Week 21 Focus
This section provides specific guidance for Week 21 activities of the final project, focusing on dataset selection, data cleaning, and initial analysis setup. These foundational steps will prepare you for the analysis and reporting phases in subsequent weeks.
By the end of Week 21, you should have:
Selected an appropriate dataset for your project
Created a Google Colab notebook for your analysis
Implemented data cleaning procedures
Begun initial exploratory data analysis
Choose a CSV dataset that aligns with one of the project themes:
Technology Applications (AI chatbots, communication apps, IoT devices)
Gaming and Entertainment (game ratings, online gaming, e-sports)
Sustainable Development Goals (any of the 17 UN SDGs)
Alternative Topics (social media effects, educational technology, etc.)
Dataset requirements:
Must be in CSV format
Should contain sufficient data points for meaningful analysis
Should have multiple variables to explore relationships
Must be relevant to your chosen theme/topic
Recommended sources:
Kaggle datasets
Data.gov and other open data portals
Industry/academic repositories
Public datasets from research institutions
Create a new Google Colab notebook with the following sections:
python
[Your code for importing and exploring the data]
[Your code for cleaning operations]
[Your code for initial exploratory analysis]
:warning: Colab Setup Tip
Start your notebook with these essential imports:
python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Your data cleaning section should include operations to:
Check for duplicates
Identify how many duplicate rows exist in your dataset
Decide whether to remove duplicates based on your analysis needs
Implement appropriate duplicate removal if necessary
Identify missing values
Calculate and display the number of missing values per column
Determine the best approach for handling missing values:
Removal (if rows/columns have too many missing values)
Imputation (filling with mean, median, mode, or predicted values)
Keeping as missing (if appropriate for your analysis)
Implement your chosen approach
Assess column relevance
Review all columns to determine their relevance to your objectives
Identify columns with little or no analytical value
Remove unnecessary columns that won't contribute to your analysis
Document your reasoning for removing any columns
Standardize data formats
Ensure text data is properly formatted (e.g., consistent case, no extra spaces)
Verify date/time data is in a consistent format
Check that numeric data has appropriate decimal precision
Convert categorical variables to the appropriate format for analysis
After cleaning, perform initial exploratory analysis:
Basic statistics
Calculate descriptive statistics for numeric columns
Note key insights about central tendency and variation
Frequency analysis
Determine the distribution of categorical variables
Identify the most/least common categories
Look for imbalanced categories that might affect analysis
Simple visualizations
Create appropriate visualizations to understand data distribution
Explore potential relationships between variables
Document initial observations about patterns in the data
Before moving to Week 22, ensure you have:
:memo: Quality Check
After cleaning, your dataset should have:
No unexpected null values
Consistent data types across columns
Logical value ranges (no negative ages, etc.)
Clear column names that describe the data
Data loading considerations :
Understand how to upload files to Google Colab
Know how to read CSV files using pandas
Be familiar with options for handling different CSV formats (delimiters, headers, etc.)
Verification practices :
Always check your dataset before and after each cleaning operation
Compare row and column counts to ensure you're not losing data unexpectedly
Review a sample of the data to confirm cleaning operations worked as expected
Backup strategy :
Create a copy of your original data before making changes
Work with copies of your data frame when performing cleaning operations
This allows you to revert to previous stages if needed
Documentation importance :
Add markdown cells explaining your reasoning for each cleaning decision
Note any assumptions you're making about the data
Record any patterns or anomalies you discover during cleaning
After completing Week 21 tasks, you will be prepared to:
Perform in-depth data analysis
Create meaningful visualizations
Begin drafting your analysis report
Identify key insights from your data
Keep your cleaned dataset and initial analysis ready for these next steps.
:information_source: Week 22 Focus
This section provides specific guidance for Week 22 activities of the final project, focusing on creating a comprehensive analysis report based on the data analysis performed in Week 21. Your report will document your findings, observations, and conclusions in a clear, structured format.
By the end of Week 22, you should have:
Created a Google Doc for your analysis report
Completed all required sections of the report
Included at least 2 distinct results with supporting tables and graphs
Ensured all visual elements are properly labeled and explained
Finalized the report with appropriate formatting and citations
Your report must include the following sections:
The cover page should include:
Title of your project
"Analysis Report" subtitle
Your name (or team members' names)
Group/team identifier (if applicable)
Class information
The overview section should include:
Background : Explain how you obtained your dataset
Describe the source of your data (e.g., "The mobile apps data is from Kaggle.com")
Provide context about the dataset (size, time period covered, limitations)
Explain any preprocessing steps you performed before analysis
Objectives : State clearly what you aimed to discover from the data
Frame objectives as specific questions or hypotheses
Explain why these objectives are relevant or interesting
Example: "To study which app categories have the highest number of downloads"
You must include at least 2 distinct results in your report. For each result:
Tables and Graphs :
Include at least 1 table and 1 graph for each result
Ensure all tables and graphs have clear titles
Label axes, legends, and data points appropriately
Use consistent formatting and color schemes
Observation :
Describe what the data shows in your tables and graphs
Point out patterns, trends, outliers, or notable findings
Use specific numbers and values from your analysis
Keep observations factual and evidence-based
Reason :
Provide your interpretation of why these patterns exist
Explain potential causes for the trends observed
Connect your findings to your initial objectives
This section should reflect your own analytical thinking
Support :
Reference external sources that support your reasoning
Include information from academic papers, industry reports, or credible articles
Explain how these external sources relate to your findings
All supporting references must be cited in your references section
The conclusion should synthesize your findings and include:
Summary :
Briefly recap the main results of your analysis
Relate findings back to your original objectives
Highlight the most significant discoveries
Suggestions :
Propose actions that could be taken based on your results
Explain who might benefit from these suggestions
Connect suggestions directly to your findings
Improvements :
Discuss how your analysis could be enhanced in the future
Identify limitations in your current approach
Suggest additional data or methods that could strengthen the analysis
List all external sources you referenced in your report
Include full URLs for online sources
Use a consistent citation format
Ensure all supporting evidence in your "Support" sections is properly cited
Clarity : Explain concepts in simple, understandable language
Avoid jargon unless necessary and defined
Use concrete examples to illustrate abstract points
Thoroughness : Provide complete explanations
Don't leave logical gaps in your reasoning
Include sufficient detail for readers to follow your analysis
Evidence-based : Support claims with data
Reference specific numbers from your analysis
Don't make unsupported generalizations
Relevance : Stay focused on your objectives
Every section should contribute to answering your research questions
Avoid tangential information
Tables :
Keep tables clean and readable
Include clear headers for all columns
Format numbers consistently (decimal places, units)
Include a title that describes the content
Graphs :
Choose appropriate graph types for your data
Label all axes with units where applicable
Include a legend if multiple data series are present
Use colors thoughtfully for clarity
Formatting :
Use consistent fonts and text sizes
Apply appropriate spacing between sections
Use headings and subheadings to organize content
Include page numbers for longer reports
Missing connections : Failing to relate your results to your objectives
Insufficient analysis : Just describing data without interpretation
Unsupported claims : Making statements without evidence
Poor visual presentation : Cluttered or mislabeled graphs and tables
Inconsistent formatting : Varying styles throughout the document
Lack of citations : Not acknowledging external sources
:warning: Critical Report Mistakes
The most common reason for low report scores:
Writing observations without explaining WHY patterns exist
Forgetting to cite external sources in the support section
Using low-quality visualizations without proper labels
A template has been provided for your convenience. It includes:
Correctly formatted cover page
Section headers for all required components
Placeholders for results with sample image placements
Space for conclusion and references
You can access the template here: Report Template
Make a copy of the template :
Open the template link
Go to File > Make a copy
Name your document appropriately (e.g., "Final Project Analysis Report - [Your Name/Group]")
Template structure :
The template includes the following pre-formatted sections:
rust
Title and Analysis Report
Name:
Group:
Class:
Overview
[Space for your overview paragraph]
Results
1 . Result number 1
[Placeholders for table and graph images]
2 . Result number 2
[Placeholders for table and graph images]
Conclusion
[Space for your conclusion paragraph]
Reference
[Numbered list for your references]
Inserting your visualizations :
Replace the image placeholders with your actual tables and graphs
In Google Docs, use Insert > Image to add your visualizations
Make sure to position and resize images appropriately
Formatting guide :
Maintain consistent font styles (the template uses default Google Docs fonts)
Use the built-in heading styles for section titles
Keep reasonable margins and spacing between elements
Number your results sections as shown in the template
Before moving to Week 23, ensure you have:
:bulb: Final Report Review
Ask a friend or family member to read your report. If they can understand your findings without seeing your code, you've written a good report!
After completing the analysis report in Week 22, you will be prepared to:
Create a presentation based on your report
Rehearse your presentation delivery
Finalize all project deliverables
Keep your completed report accessible as you will reference it when creating your presentation slides.
:information_source: Week 23 Focus
This section provides specific guidance for Week 23 activities of the final project, focusing on delivering an effective presentation of your data analysis findings. This is the culmination of your project work from Weeks 21-22 and your opportunity to showcase your analysis, insights, and communication skills.
By the end of Week 23, you should have:
Finalized your presentation slides
Delivered your presentation (live or recorded)
Demonstrated your understanding of the data analysis process
Effectively communicated your findings and their significance
You have two presentation format options:
Deliver your presentation live during class time
Suitable for groups with all members present
Ideal for those with stable internet connections
Allows for immediate Q&A and feedback
Record your presentation in advance
Suitable for groups whose members cannot all be present
Good option for those with unstable internet connections
Must be uploaded to Google Drive before the deadline
Time Limit : 5-10 minutes
Participation : All members must present a portion of the material
Sequence : Presentation order will be determined randomly using Picker Wheel
The sequence will be posted on the Telebort Dashboard
Be prepared to present when your turn comes
Your presentation must include the following sections:
Project title
Your name (or team members' names)
Class information
Introduction to your project topic
Background information on your dataset
Clear statement of your objectives
Present at least 2 distinct results from your analysis
Each result should include:
At least one graph and one table
Clear explanation of what the data shows
Your interpretation of the findings
Supporting evidence from credible sources
Summary of key findings
Suggestions for actions based on your results
Potential improvements or next steps
Use concise, point-form text
Break long sentences into bullet points
Example: "The category with the most apps: Game (301)"
Instead of lengthy paragraphs
Choose comfortable background colors
Soft, neutral colors (light beige, pale blue, soft green)
Consistent color scheme throughout
High contrast between text and background
Use appropriate fonts and styles
Readable fonts (Arial, Calibri, Georgia)
Consistent font usage throughout
Adequate font size (min. 24pt for body text, 32pt for headings)
Include relevant photos and animations
Visual elements that support your message
Clean, professional graphics
Appropriate charts and diagrams
Avoid lengthy sentences on slides
Don't put full paragraphs on slides
Don't include excessive detail in text form
Avoid distracting backgrounds
No bright, jarring colors
No busy patterns
No excessive graphics
Avoid fancy or hard-to-read fonts
No decorative or script fonts
No mixing too many different fonts
No font sizes that are too small
Avoid distracting images
No irrelevant pictures
No excessive animations
No cluttered visuals
:bulb: Slide Design Rule
Follow the 6-6-6 rule: No more than 6 bullet points per slide, 6 words per bullet, and spend about 6 seconds explaining each point.
Be confident
Stand (or sit) straight
Speak clearly and at an appropriate volume
Make eye contact with the audience or camera
Explain in your own words
Don't read directly from the slides
Show your understanding of the material
Use natural, conversational language
Show your passion
Demonstrate enthusiasm for your topic
Vary your tone to emphasize key points
Share why your findings matter
Ensure all team members participate
Divide speaking parts equally
Support each other during transitions
Acknowledge team contributions
Don't appear too nervous
Practice beforehand to build confidence
Remember: everyone is supportive
Take deep breaths if feeling anxious
Don't read slides verbatim
Use slides as prompts, not a script
Add value beyond what's written on slides
Look at the audience, not just the screen
Don't be casual or unprepared
Treat the presentation professionally
Practice your timing
Anticipate potential questions
Don't let only one person dominate
Ensure balanced participation
Practice smooth transitions between speakers
Support team members if they struggle
:memo: Presentation Confidence Booster
Remember: You're the expert on your analysis! You've spent weeks working on this project. Your audience wants to learn from you, not judge you.
Practice your presentation multiple times
Rehearse individually and as a team
Time your presentation to stay within limits
Practice with the actual slides you'll use
Prepare for technical issues
Have a backup of your presentation
Test all equipment beforehand
Have a contingency plan
Anticipate questions
Consider what questions might arise
Be prepared to elaborate on your findings
Know your data well enough to provide additional insights
For video presentations
Ensure good lighting and clear audio
Find a quiet location without background noise
Test your recording setup before the final recording
For professional-looking slide templates, consider these sources:
SlidesMania - Free templates for various presentation styles
SlidesGo - Modern, creative presentation templates
Canva - Variety, animated presentation templates
Before your presentation day, ensure you have:
:warning: Last-Minute Check
Test your technology setup at least 1 hour before presentation time:
Internet connection stability
Microphone and camera functionality
Screen sharing capabilities
Backup plan if technical issues arise
Your presentation is the culmination of weeks of work - be proud of what you've accomplished!
Focus on clearly communicating your findings and their significance
Remember that effective presentation skills are valuable beyond this project
The presentation evaluation focuses on both content quality and delivery style
Approach this as an opportunity to share your discoveries, not just as an assessment
Good luck with your presentation!
Ready to go above and beyond? Try these advanced challenges:
Machine Learning Integration
Apply a simple ML model to your dataset
Make predictions based on your data
Compare different algorithms
Interactive Dashboard
Create an interactive visualization using Plotly
Add filters and controls for data exploration
Deploy using Google Colab's sharing features
Statistical Testing
Perform hypothesis testing on your findings
Calculate confidence intervals
Validate your conclusions statistically
Data Storytelling
Create a narrative arc for your presentation
Use analogies to explain complex findings
Design custom visualizations
Professional Polish
Create a one-page executive summary
Design an infographic of key findings
Record a professional video with editing
Real-World Application
Contact a relevant organization with your findings
Propose actionable recommendations
Create an implementation plan
:bulb: Excellence Recognition
Students who complete extension challenges may receive special recognition or bonus points. Check with your instructor about opportunities for presenting exceptional work!
Code with AI: Try using AI to help with dataset selection and cleaning.
Prompts:
"Suggest appropriate datasets for analyzing [your topic], preferably in CSV format."
"Help me identify potential data quality issues in this dataset: [paste first 10 rows of your data]."
"Write code to clean this dataset by handling missing values, removing duplicates, and standardizing formats: [provide data description or sample]."
"What columns from this dataset are most relevant for analyzing [your specific objective]?"
Code with AI: Try using AI to assist with report writing and visualization design.
Prompts:
"Create a clear overview section for my analysis report on [topic] that includes background and objectives."
"Help me interpret this visualization: [describe or share graph]. What key observations can I make?"
"Suggest ways to present this finding more effectively: [describe your current result and visualization]."
"Draft a conclusion section summarizing these results and suggesting next actions: [list your key findings]."
"How can I properly cite this source in my report: [provide source details]?"
Code with AI: Try using AI to help create effective presentation content and improve delivery.
Prompts:
"Convert this paragraph into concise bullet points for my presentation slide: [paste paragraph]."
"Suggest a clean, professional slide design for presenting data analysis on [your topic]."
"What are the most important aspects of my analysis to highlight in a 5-minute presentation?"
"Help me create a script for presenting this slide that sounds natural and engaging: [describe slide content]."
"How can I effectively transition between these two sections of my presentation: [describe the sections]?"