Demo Mode

No student ID available

Activity 1 of 18

Activity 1: Introduction to Reinforcement Learning

Practice and reinforce the concepts from Lesson 1

Activity 01: Introduction to Reinforcement Learning

Overview

In this hands-on activity, you'll implement your first reinforcement learning agent! You'll create an agent that learns to balance a pole on a cart (CartPole environment) through trial and error.

Learning Objectives

By completing this activity, you will:

Set up a reinforcement learning environment using Gymnasium
Implement a basic agent-environment interaction loop
Observe how random actions perform in RL tasks
Visualize agent performance over multiple episodes
Calculate and plot cumulative rewards
Understand the importance of learning algorithms (coming in next lessons!)

Prerequisites

Completed Concept 01: Introduction to Reinforcement Learning
Basic Python programming skills
Google Colab account

Getting Started

Step One: Access the Template

Download the activity template from the Templates folder:

Template: AI25-Template-activity-01-introduction-to-reinforcement-learning.zip
Location: Templates/AI25-Template-activity-01-introduction-to-reinforcement-learning.zip

Step 2: Open in Google Colab

Extract the ZIP file
Upload activity-01-introduction-to-reinforcement-learning.ipynb to Google Colab
Set Runtime to GPU: Runtime -> Change runtime type -> GPU (T4 recommended)

Step 3: Run Initial Cells

Execute the first few cells to:

Verify GPU availability
Install required packages (Gymnasium, PyTorch, etc.)
Import libraries

You should see output confirming GPU access!

What You'll Build

Part One: Environment Exploration (Pre-Built)

You'll explore the CartPole-v1 environment:

Task: Keep a pole balanced on a moving cart
State: 4 values (cart position, cart velocity, pole angle, pole angular velocity)
Actions: 2 options (push cart left or right)
Reward: +1 for each timestep the pole stays upright
Goal: Maximize total reward by keeping pole balanced

Part 2: Random Agent Baseline (Pre-Built)

Run a random agent that takes random actions. You'll see:

Immediate results in under 30 seconds!
Performance visualization showing reward per episode
Average reward calculation (typically 20-25 for random policy)

Part 3: Interaction Loop Implementation (YOU COMPLETE)

This is where you'll write code! You'll implement:

TODO 1: Complete the episode loop to run multiple episodes TODO 2: Store rewards and episode lengths for analysis TODO 3: Calculate moving average for smoother visualization TODO 4: Add early stopping when performance goal is reached

Part 4: Performance Analysis (Pre-Built)

Visualization dashboard showing:

Reward per episode
Moving average trend
Episode length statistics
Comparison to random baseline

Expected Results

Random Agent Performance

Average Reward: 20-25 episodes before pole falls
Max Reward: Rarely exceeds 50
Consistency: High variance, very unpredictable

After Completing Activity

Working agent-environment interaction loop
Clean performance visualizations
Understanding of why we need learning algorithms
Baseline to compare against future RL agents

Success Criteria

Your implementation is complete when:

All TODO sections are filled in with working code
Agent runs for specified number of episodes without errors
Performance plots display correctly with proper labels
Moving average calculation smooths reward curve
Code matches the success criteria in each TODO comment

Tips for Success

Understanding the CartPole Task

The pole starts nearly vertical
Small angle deviations accumulate quickly
Random actions can't anticipate which direction to push
You need a learning algorithm to master this task (coming next lesson!)

Debugging Common Issues

Import errors: Ensure all cells run in order from top to bottom
GPU not available: This activity works fine on CPU too
Plotting errors: Check that reward lists aren't empty

Performance Expectations

Remember: A random agent performs poorly! That's the point. This establishes why we need:

Lesson 2: Q-Learning to learn state-action values
Lesson 3: Deep Q-Networks for complex tasks
Lessons 4-6: Advanced RL algorithms

Extension Challenges

Once you complete the basic activity, try these extensions:

Challenge One: Different Environments (Easy)

Replace CartPole with other Gymnasium environments:

MountainCar-v0: Drive a car up a steep hill
Acrobot-v1: Swing a two-link robot to a goal height
LunarLander-v2: Land a spacecraft safely

Challenge 2: Semi-Random Policy (Medium)

Create a slightly smarter random policy:

If pole leans left, push left 70% of the time, right 30%
See if this simple heuristic improves performance

Challenge 3: Multiple Runs (Medium)

Run 10 random agents and plot average performance with confidence intervals.

Challenge 4: Early Stopping (Hard)

Implement early stopping when reward exceeds a threshold (though random agents rarely achieve this).

Submission Requirements

What to Submit

Completed Notebook: activity-01-introduction-to-reinforcement-learning.ipynb
- All code cells executed
- Output visible for all cells
- TODOs completed
Performance Report: Brief summary including:
- Average reward for random agent
- Best episode reward
- Observations about random agent limitations

Submission Steps

Run all cells from top to bottom (Runtime -> Run all)
Verify all visualizations display correctly
Download notebook: File -> Download -> Download .ipynb
Submit via [course portal link]

Resources

Documentation

Markov Decision Processes (MDPs)
Exploration vs Exploitation
Episodic vs Continuing Tasks
State representation in RL

Next Steps

After completing this activity:

Concept 02: Q-Learning and Value Functions
Activity 02: Implement tabular Q-Learning for FrozenLake
Concept 03: Deep Q-Networks for complex state spaces

In the next activity, you'll implement a learning algorithm that achieves 200+ average reward on CartPole-a dramatic improvement over the random baseline!

Troubleshooting

Issue: "Gymnasium not found"

Solution: Ensure the installation cell ran successfully. Restart runtime if needed.

Issue: Plots not displaying

Solution: Make sure matplotlib is imported and %matplotlib inline is set.

Issue: Agent terminates too quickly

Solution: This is expected for random agents! That's why we need learning algorithms.

Issue: Code runs slowly

Solution: Reduce number of episodes or enable GPU runtime (though CPU is fine for this activity).

Assessment

This activity is graded on:

Code Completion (40%): All TODOs implemented correctly
Code Quality (30%): Clean, readable, properly commented
Visualizations (20%): Plots display correctly with proper labels
Understanding (10%): Brief report demonstrates grasp of concepts

Passing Grade: 70% or higher

Good luck, and enjoy your first hands-on reinforcement learning experience! 🚀