Student starter code (30% baseline)
index.html- Main HTML pagescript.js- JavaScript logicstyles.css- Styling and layoutpackage.json- Dependenciessetup.sh- Setup scriptREADME.md- Instructions (below)💡 Download the ZIP, extract it, and follow the instructions below to get started!
Estimated Time: 60 minutes Course: AI-1 Data Analysis and Data Science Activity: 13
Master data visualization fundamentals! This activity teaches you to create professional line plots, bar charts, and scatter plots - essential skills for communicating data insights.
By the end of this activity, you will be able to:
Choose your development environment based on your needs:
Best for: Local development with cloud runtime consistency
Setup (first time only):
Open in VS Code:
code activity-13-matplotlib.ipynb
Select Google Colab Kernel:
Cmd+Shift+P -> Type "Select Kernel" -> Choose "Google Colab"Authenticate:
Run First Cell:
Advantages:
First Connection: ~30-60 seconds Subsequent Connections: ~10-15 seconds
Best for: Quick access from any device, no setup required
Setup:
Open in Browser:
activity-13-matplotlib.ipynbRun Cells:
Shift+EnterAdvantages:
Note: Web Colab has the same free tier limits as VS Code integration
Best for: Offline work, quick edits, debugging
Setup:
# Install dependencies (one-time)
pip install -r requirements.txt
# Start Jupyter Notebook
jupyter notebook activity-13-matplotlib.ipynb
# Or use Jupyter Lab (modern interface)
jupyter lab activity-13-matplotlib.ipynb
Advantages:
Limitations:
Use When: Quick edits, no internet, testing local changes
Before starting, understand these session limits:
| Limit | Value | Impact |
|---|---|---|
| Session Duration | 12 hours max | Restart every 12 hours |
| Idle Timeout | 90 minutes | Disconnects if inactive |
| RAM | 12GB | Sufficient for this activity |
💡 Tips:
Cmd+S (Mac) or Ctrl+S (Windows)65% of the code is implemented for you:
Before jumping into TODOs:
Estimated Time: 10 minutes
What You'll Build: Create a line plot showing monthly sales trends over a year.
Success Criteria:
plt.plot()plt.show()Hints:
plt.plot(x_data, y_data) to create line plotplt.xlabel("text"), plt.ylabel("text")plt.title("text")plt.show() to display the plotExpected Output:
[Line plot showing upward sales trend from January to December]
- X-axis: Months (Jan-Dec)
- Y-axis: Sales values (0-10000)
- Blue line connecting data points
Key Concepts:
Code Location: Cell 4 (search for # TODO 1:)
Estimated Time: 12 minutes
What You'll Build: Create a line plot comparing sales trends for two products.
Success Criteria:
Hints:
plt.plot(x, y1, label="Product A") then plt.plot(x, y2, label="Product B")color='blue' or color='red'linestyle='-' (solid) or linestyle='--' (dashed)plt.legend()Expected Output:
[Line plot with two lines]
- Blue solid line: Product A sales
- Red dashed line: Product B sales
- Legend in corner identifying each product
Key Concepts:
Code Location: Cell 6 (search for # TODO 2:)
Estimated Time: 10 minutes
What You'll Build: Create a styled line plot with markers, grid, and custom figure size.
Success Criteria:
Hints:
marker='o' (circle), marker='s' (square)plt.grid(True)plt.figure(figsize=(10, 6)) before plottinglinewidth=2, markersize=8Expected Output:
[Line plot with enhanced styling]
- Data points marked with circles
- Grid lines visible
- Larger figure size (10x6)
- Thicker line for visibility
Key Concepts:
Code Location: Cell 8 (search for # TODO 3:)
Estimated Time: 10 minutes
What You'll Build: Create a bar chart comparing sales across different product categories.
Success Criteria:
plt.bar()Hints:
plt.bar(categories, values)plt.text(x, y, value)plt.xticks(rotation=45)color=['blue', 'green', 'red', ...]Expected Output:
[Bar chart with 5 categories]
- Categories: Electronics, Clothing, Food, Books, Toys
- Different colored bars for each category
- Value labels displayed on top of each bar
Key Concepts:
Code Location: Cell 10 (search for # TODO 4:)
Estimated Time: 15 minutes
What You'll Build: Create a grouped bar chart comparing Q1 and Q2 sales for multiple products.
Success Criteria:
Hints:
x = np.arange(len(categories))plt.bar(x - width/2, q1_values) and plt.bar(x + width/2, q2_values)width = 0.35plt.xticks(x, categories)Expected Output:
[Grouped bar chart]
- Two bars per product (Q1 in blue, Q2 in orange)
- Bars positioned side-by-side
- Legend distinguishing quarters
Key Concepts:
Code Location: Cell 12 (search for # TODO 5:)
Estimated Time: 8 minutes
What You'll Build: Create a horizontal bar chart for ranking data (top 5 products).
Success Criteria:
plt.barh() for horizontal barsHints:
plt.barh(categories, values).sort_values(ascending=False)plt.gca().invert_yaxis()plt.text(value, y_position, f'{value}')Expected Output:
[Horizontal bar chart]
- Bars extend from left to right
- Highest value at top
- Value labels at end of each bar
Key Concepts:
Code Location: Cell 14 (search for # TODO 6:)
Estimated Time: 10 minutes
What You'll Build: Create a scatter plot exploring the relationship between advertising spend and sales.
Success Criteria:
plt.scatter()Hints:
plt.scatter(x_data, y_data)c=region_codes, cmap='viridis'plt.colorbar(label='Region')np.polyfit() and np.poly1d()Expected Output:
[Scatter plot showing positive correlation]
- Points scattered showing trend
- Points colored by region
- Trend line overlaid
- Colorbar on right side
Key Concepts:
Code Location: Cell 16 (search for # TODO 7:)
Estimated Time: 15 minutes
What You'll Build: Create a figure with 2x2 subplots showing different scatter plot analyses.
Success Criteria:
plt.subplots(2, 2)Hints:
fig, axes = plt.subplots(2, 2, figsize=(12, 10))axes[0, 0].scatter(x, y)axes[0, 0].set_title('Title')fig.suptitle('Main Title', fontsize=16)plt.tight_layout()Expected Output:
[2x2 grid of scatter plots]
- Top-left: Price vs Sales
- Top-right: Marketing vs Sales
- Bottom-left: Quantity vs Revenue
- Bottom-right: Discount vs Profit
Each with appropriate labels and styling
Key Concepts:
Code Location: Cell 18 (search for # TODO 8:)
Run All Cells:
Visualization Validation:
Data Accuracy:
You don't need to write these - they're already working! Just call them with your data:
generate_sales_data() # Creates sample sales dataset
generate_scatter_data() # Creates correlated x-y data
create_category_data() # Generates categorical data
add_value_labels(ax, bars) # Adds labels on top of bars
format_currency(value) # Formats numbers as currency
save_plot(filename) # Saves figure to file
Issue: "ModuleNotFoundError: No module named 'matplotlib'" Solution:
# In Colab, matplotlib is pre-installed
# If using local Jupyter, install dependencies:
pip install matplotlib
# Then restart kernel: Kernel → Restart
Issue: "Figure not displaying" or "Blank output" Solution:
%matplotlib inline magic is in first cellplt.show() at end of plotting codeplt.switch_backend('agg') then plt.show()plt.clf() before plottingIssue: "Session disconnected" or "Kernel restarting" Solution:
Cmd+S / Ctrl+S)Issue: "Text overlapping" or "Labels cut off" Solution:
# Fix overlapping x-axis labels
plt.xticks(rotation=45, ha='right')
# Adjust layout to prevent cutoff
plt.tight_layout()
# Manually adjust margins
plt.subplots_adjust(bottom=0.15, left=0.15)
# Increase figure size
plt.figure(figsize=(12, 8))
Issue: "Plot axis limits too tight" or "Data cut off" Solution:
# Set custom axis limits
plt.xlim(0, 100) # x-axis from 0 to 100
plt.ylim(0, 500) # y-axis from 0 to 500
# Add padding automatically
plt.margins(0.1) # 10% padding on all sides
# Let matplotlib auto-scale
plt.autoscale()
Issue: "Colors don't match" or "Plot looks different than expected" Solution:
color='blue' instead of relying on defaultscolor='#1f77b4' for exact hex colorsIssue: "Memory errors with large datasets" Solution:
# Reduce number of points plotted
# For scatter plots with 100k+ points:
sample_size = 10000
sample_indices = np.random.choice(len(data), sample_size, replace=False)
plt.scatter(data[sample_indices]['x'], data[sample_indices]['y'])
# Use rasterization for large plots
plt.scatter(x, y, rasterized=True)
# Clear figure after saving
plt.savefig('plot.png')
plt.close() # Frees memory
Check Data Shape:
print(f"X data shape: {x_data.shape}")
print(f"Y data shape: {y_data.shape}")
# Should have compatible dimensions
Verify Plot Object:
# Check if plot was created
ax = plt.gca() # Get current axis
print(f"Number of lines: {len(ax.lines)}")
print(f"Number of patches (bars): {len(ax.patches)}")
Inspect Data Range:
print(f"X range: {x_data.min()} to {x_data.max()}")
print(f"Y range: {y_data.min()} to {y_data.max()}")
# Ensure data is within expected range
Test Incrementally:
plt.plot(x, y)Use Matplotlib Documentation:
This activity was tested with Google Colab runtime as of November 2024:
| Library | Version (Colab) | Notes |
|---|---|---|
| NumPy | One.26.4 | Pre-installed |
| Pandas | 2.1.4 | Pre-installed |
| Matplotlib | 3.8.0 | Pre-installed |
Note: Colab libraries update periodically. If you encounter version conflicts:
!pip list | grep -E "numpy|pandas|matplotlib"Completed the main activity? Try these bonus challenges:
Your activity is complete when:
Cmd+S / Ctrl+Smatplotlib/cheatsheetsOnce you complete this activity, you'll have:
This activity equips you with essential data visualization skills used by data analysts and data scientists worldwide!
Stuck on a TODO?
Still stuck after 15 minutes?
Pro Tips:
plt.savefig('plot.png', dpi=300, bbox_inches='tight') to export plotsRemember: Visualization is both art and science. It's OK to:
Happy plotting! 🚀📊✨
This activity follows the 65-70% implementation methodology: example visualizations provided, students complete targeted plotting exercises