In this hands-on activity, you'll implement your first reinforcement learning agent! You'll create an agent that learns to balance a pole on a cart (CartPole environment) through trial and error.
By completing this activity, you will:
- Set up a reinforcement learning environment using Gymnasium
- Implement a basic agent-environment interaction loop
- Observe how random actions perform in RL tasks
- Visualize agent performance over multiple episodes
- Calculate and plot cumulative rewards
- Understand the importance of learning algorithms (coming in next lessons!)
- Completed Concept 01: Introduction to Reinforcement Learning
- Basic Python programming skills
- Google Colab account
Download the activity template from the Templates folder:
- Template:
AI25-Template-activity-01-introduction-to-reinforcement-learning.zip
- Location:
Templates/AI25-Template-activity-01-introduction-to-reinforcement-learning.zip
- Extract the ZIP file
- Upload
activity-01-introduction-to-reinforcement-learning.ipynb to Google Colab
- Set Runtime to GPU: Runtime -> Change runtime type -> GPU (T4 recommended)
Execute the first few cells to:
- Verify GPU availability
- Install required packages (Gymnasium, PyTorch, etc.)
- Import libraries
You should see output confirming GPU access!
You'll explore the CartPole-v1 environment:
- Task: Keep a pole balanced on a moving cart
- State: 4 values (cart position, cart velocity, pole angle, pole angular velocity)
- Actions: 2 options (push cart left or right)
- Reward: +1 for each timestep the pole stays upright
- Goal: Maximize total reward by keeping pole balanced
Run a random agent that takes random actions. You'll see:
- Immediate results in under 30 seconds!
- Performance visualization showing reward per episode
- Average reward calculation (typically 20-25 for random policy)
This is where you'll write code! You'll implement:
TODO 1: Complete the episode loop to run multiple episodes
TODO 2: Store rewards and episode lengths for analysis
TODO 3: Calculate moving average for smoother visualization
TODO 4: Add early stopping when performance goal is reached
Visualization dashboard showing:
- Reward per episode
- Moving average trend
- Episode length statistics
- Comparison to random baseline
- Average Reward: 20-25 episodes before pole falls
- Max Reward: Rarely exceeds 50
- Consistency: High variance, very unpredictable
- Working agent-environment interaction loop
- Clean performance visualizations
- Understanding of why we need learning algorithms
- Baseline to compare against future RL agents
Your implementation is complete when:
- The pole starts nearly vertical
- Small angle deviations accumulate quickly
- Random actions can't anticipate which direction to push
- You need a learning algorithm to master this task (coming next lesson!)
- Import errors: Ensure all cells run in order from top to bottom
- GPU not available: This activity works fine on CPU too
- Plotting errors: Check that reward lists aren't empty
Remember: A random agent performs poorly! That's the point. This establishes why we need:
- Lesson 2: Q-Learning to learn state-action values
- Lesson 3: Deep Q-Networks for complex tasks
- Lessons 4-6: Advanced RL algorithms
Once you complete the basic activity, try these extensions:
Replace CartPole with other Gymnasium environments:
MountainCar-v0: Drive a car up a steep hill
Acrobot-v1: Swing a two-link robot to a goal height
LunarLander-v2: Land a spacecraft safely
Create a slightly smarter random policy:
- If pole leans left, push left 70% of the time, right 30%
- See if this simple heuristic improves performance
Run 10 random agents and plot average performance with confidence intervals.
Implement early stopping when reward exceeds a threshold (though random agents rarely achieve this).
-
Completed Notebook: activity-01-introduction-to-reinforcement-learning.ipynb
- All code cells executed
- Output visible for all cells
- TODOs completed
-
Performance Report: Brief summary including:
- Average reward for random agent
- Best episode reward
- Observations about random agent limitations
- Run all cells from top to bottom (Runtime -> Run all)
- Verify all visualizations display correctly
- Download notebook: File -> Download -> Download .ipynb
- Submit via [course portal link]
- Markov Decision Processes (MDPs)
- Exploration vs Exploitation
- Episodic vs Continuing Tasks
- State representation in RL
After completing this activity:
- Concept 02: Q-Learning and Value Functions
- Activity 02: Implement tabular Q-Learning for FrozenLake
- Concept 03: Deep Q-Networks for complex state spaces
In the next activity, you'll implement a learning algorithm that achieves 200+ average reward on CartPole-a dramatic improvement over the random baseline!
Solution: Ensure the installation cell ran successfully. Restart runtime if needed.
Solution: Make sure matplotlib is imported and %matplotlib inline is set.
Solution: This is expected for random agents! That's why we need learning algorithms.
Solution: Reduce number of episodes or enable GPU runtime (though CPU is fine for this activity).
This activity is graded on:
- Code Completion (40%): All TODOs implemented correctly
- Code Quality (30%): Clean, readable, properly commented
- Visualizations (20%): Plots display correctly with proper labels
- Understanding (10%): Brief report demonstrates grasp of concepts
Passing Grade: 70% or higher
Good luck, and enjoy your first hands-on reinforcement learning experience! 🚀