Student starter code (30% baseline)
index.html- Main HTML pagescript.js- JavaScript logicstyles.css- Styling and layoutpackage.json- Dependenciessetup.sh- Setup scriptREADME.md- Instructions (below)💡 Download the ZIP, extract it, and follow the instructions below to get started!
By completing this activity, you will:
Runtime -> Run all (or press Ctrl+F9)Expected First Run Time: ~45 seconds
The template comes with 65% working code:
Location: Section 7 - "Agent Implementation"
Current State: The agent takes completely random actions (50/50 left/right)
Your Task: Implement an epsilon-greedy policy that:
Starter Code Provided:
class EpsilonGreedyAgent:
def __init__(self, epsilon=0.3):
self.epsilon = epsilon
def select_action(self, observation):
# TODO: Implement epsilon-greedy logic here
# Hint: observation[2] is the pole angle
# Hint: action 0 = left, action 1 = right
pass
Success Criteria:
Location: Section 9 - "Analysis and Visualization"
Your Task: Track and visualize how often the agent takes each action (left vs right)
Requirements:
min(left, right) / max(left, right)Success Criteria:
Location: Section 10 - "Policy Comparison"
Your Task: Run experiments comparing three policies:
Requirements:
Success Criteria:
Location: New section you'll create
Your Task: Find the optimal epsilon value through systematic testing
Requirements:
Success Criteria:
Once you've completed all TODOs, try these advanced challenges:
Implement epsilon decay: Start with high exploration (ε=0.9) and gradually reduce to low exploration (ε=0.1) over episodes. Does this improve performance?
Improve the heuristic by considering both pole angle AND cart velocity. Use this improved heuristic in your epsilon-greedy agent.
Test your epsilon-greedy agent on other Gymnasium environments:
MountainCar-v0Acrobot-v1LunarLander-v2 (requires Box2D: pip install box2d-py)Does the same epsilon work well across environments?
Implement a simple Q-table to learn optimal actions:
Q(s,a) += α * (reward + γ * max_Q(s') - Q(s,a))Before submitting, ensure you've completed:
Minimum Requirements (for passing):
Target Grade (for excellent work):
Exceptional Work (bonus points):
Solution: Run the installation cell at the top of the notebook:
!pip install gymnasium[classic_control] matplotlib imageio imageio-ffmpeg
Solution: This is normal in Colab headless mode. The notebook uses rgb_array mode and saves videos instead. Check the /content/videos/ directory for recordings.
Solution:
observation[2] (pole angle) correctlySolution: Download the MP4 file and play locally, or use Colab's built-in video player:
from IPython.display import Video
Video('/content/videos/episode_0.mp4', width=400)
Submission Checklist:
activity-01-[YourName].ipynbAfter completing this activity:
Key Insight: This activity introduced you to the RL loop with a simple heuristic. In Activity 02, you'll learn how agents can discover optimal policies without hand-coded heuristics!
Good luck! Remember: the goal is to understand how RL agents interact with environments, not to achieve perfect performance. Focus on learning and experimentation! 🚀