Demo Mode

No student ID available

Project 1 of 7

Project 1: DQN Game Master

Apply your knowledge to build something amazing!

Project One: DQN Game Master

Duration: 2 weeks Points: 100 Prerequisites: Complete Lessons 1-3 (RL Fundamentals, Q-Learning, DQN) Difficulty: Intermediate

Project Overview

In this project, you'll train a Deep Q-Network (DQN) agent to master an Atari game from raw pixel observations. You'll implement the complete DQN algorithm including experience replay, target networks, and epsilon-greedy exploration. Your agent will learn to play at human-level or beyond, demonstrating the power of deep reinforcement learning.

Why This Matters: DQN was the breakthrough algorithm that showed deep learning could solve complex RL tasks directly from high-dimensional inputs (images). This project replicates the DeepMind research that started the modern deep RL revolution.

What You'll Build:

Complete DQN agent with experience replay buffer
Convolutional neural network for Q-value approximation
Training pipeline with TensorBoard logging
Agent evaluation and visualization system
Portfolio-ready demo video of trained agent

Learning Objectives

By completing this project, you will:

Implement the full DQN algorithm from the DeepMind Nature paper
Design a convolutional neural network architecture for Atari games
Apply experience replay and target networks to stabilize training
Optimize hyperparameters (learning rate, epsilon schedule, replay size)
Evaluate agent performance using episodic returns and training curves
Visualize learned Q-values and agent gameplay videos

Requirements

Functional Requirements

Your DQN agent must:

Train on Atari game: Choose from Pong, Breakout, SpaceInvaders, or Seaquest
Achieve target performance: Reach the performance threshold within reasonable training time:
- Pong: Average ``reward >= 15`` (human: ~9)
- Breakout: Average ``reward >= 300`` (human: ~30)
- SpaceInvaders: Average ``reward >= 1000`` (human: ~600)
- Seaquest: Average ``reward >= 500`` (human: ~200)
Use raw pixels: Take 84x84 grayscale frames as input (not hand-crafted features)
Implement experience replay: Replay buffer with at least 10,000 transitions
Use target network: Separate target Q-network updated every N steps
Epsilon-greedy exploration: Decay epsilon from 1.0 to 0.1 over training
Produce demo video: Record trained agent playing 5 complete episodes

Technical Requirements

Your implementation must include:

DQN Model: Convolutional neural network with 3 conv layers + 2 fully connected layers
Replay Buffer: Efficient storage and sampling of (s, a, r, s', done) tuples
Training Loop: Updates Q-network using batches from replay buffer
Logging: TensorBoard logs for loss, reward, epsilon, Q-values
Checkpointing: Save model weights every 10K steps
Evaluation: Separate evaluation runs (no exploration noise) every 5K steps

Code Structure

graphql

project-01-dqn-game-master/
├── README.md                  # Project documentation
├── requirements.txt           # Python dependencies
├── dqn_agent.py              # DQN agent implementation
├── replay_buffer.py          # Experience replay buffer
├── train.py                  # Training script
├── evaluate.py               # Evaluation and video generation
├── models/                   # Saved model checkpoints
├── logs/                     # TensorBoard logs
└── videos/                   # Recorded gameplay videos

Grading Rubric

Criterion	Points	Description
DQN Implementation	30	Correct DQN algorithm with experience replay and target network
Performance	25	Agent reaches target performance threshold
Code Quality	15	Clean, modular, well-documented code
Hyperparameter Tuning	10	Evidence of systematic tuning (log results)
Visualization	10	TensorBoard logs and demo video
Documentation	10	Clear README with setup instructions and results
Total	100

Bonus Points (+10 each):

Implement Double DQN (reduces overestimation bias)
Implement Dueling DQN (separate value and advantage streams)
Implement Prioritized Experience Replay (sample important transitions more often)
Train agent on multiple games and compare results

Milestones

Week One: Implementation and Initial Training

Day 1-2: Environment Setup

Install gymnasium[atari] and dependencies
Verify Atari game loads correctly
Test frame preprocessing (grayscale, resize, stack)

Day 3-4: DQN Components

Implement replay buffer class
Implement DQN model architecture (CNN)
Implement epsilon-greedy action selection

Day 5-7: Training Pipeline

Implement training loop
Add TensorBoard logging
Start initial training run (monitor for bugs)

Deliverable: Working training script that logs to TensorBoard

Week 2: Optimization and Evaluation

Day 8-10: Hyperparameter Tuning

Tune learning rate (try 1e-4, 2.5e-4, 5e-4)
Tune target network update frequency (1000, 5000, 10000 steps)
Tune batch size (32, 64, 128)

Day 11-12: Extended Training

Train for at least 1 million steps
Monitor convergence
Save best checkpoint

Day 13-14: Evaluation and Documentation

Evaluate trained agent (10 episodes, no exploration)
Record demo video
Write comprehensive README
Prepare presentation

Deliverable: Trained agent, demo video, complete documentation

Implementation Guide

DQN Model Architecture

python

import torch
import torch.nn as nn

class DQN(nn.Module):
    def __init__(self, num_actions):
        super().__init__()
        # Input: 4 stacked frames of 84x84 grayscale images
        self.conv1 = nn.Conv2d(4, 32, kernel_size=8, stride=4)  # Output: 32x20x20
        self.conv2 = nn.Conv2d(32, 64, kernel_size=4, stride=2)  # Output: 64x9x9
        self.conv3 = nn.Conv2d(64, 64, kernel_size=3, stride=1)  # Output: 64x7x7

        # Fully connected layers
        self.fc1 = nn.Linear(64 * 7 * 7, 512)
        self.fc2 = nn.Linear(512, num_actions)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.relu(self.conv2(x))
        x = torch.relu(self.conv3(x))
        x = x.view(x.size(0), -1)  # Flatten
        x = torch.relu(self.fc1(x))
        q_values = self.fc2(x)
        return q_values

Experience Replay Buffer

python

import numpy as np
import random
from collections import deque

class ReplayBuffer:
    def __init__(self, capacity=100000):
        self.buffer = deque(maxlen=capacity)

    def push(self, state, action, reward, next_state, done):
        self.buffer.append((state, action, reward, next_state, done))

    def sample(self, batch_size):
        batch = random.sample(self.buffer, batch_size)
        states, actions, rewards, next_states, dones = zip(*batch)
        return (
            np.array(states),
            np.array(actions),
            np.array(rewards),
            np.array(next_states),
            np.array(dones)
        )

    def __len__(self):
        return len(self.buffer)

Training Loop Skeleton

python

def train_dqn(env, agent, num_steps=1_000_000):
    state = env.reset()
    episode_reward = 0
    episode_count = 0

    for step in range(num_steps):
        # Select action with epsilon-greedy
        epsilon = max(0.1, 1.0 - step / 100000)  # Linear decay
        action = agent.select_action(state, epsilon)

        # Execute action
        next_state, reward, done, info = env.step(action)
        episode_reward += reward

        # Store transition
        agent.replay_buffer.push(state, action, reward, next_state, done)

        # Train on batch
        if len(agent.replay_buffer) > agent.batch_size:
            loss = agent.update()

        # Update target network
        if step % agent.target_update_freq == 0:
            agent.update_target_network()

        # Log metrics
        if step % 1000 == 0:
            writer.add_scalar('Loss', loss, step)
            writer.add_scalar('Epsilon', epsilon, step)

        # Handle episode end
        if done:
            writer.add_scalar('Reward', episode_reward, episode_count)
            episode_reward = 0
            episode_count += 1
            state = env.reset()
        else:
            state = next_state

Hyperparameter Recommendations

Hyperparameter	Recommended Value	Notes
Learning rate	2.5e-4	RMSprop optimizer
Batch size	32	Smaller is more sample efficient
Replay buffer size	100,000	Balance memory and diversity
Target update frequency	10,000 steps	Too frequent -> unstable
Epsilon start	One.0	Full exploration initially
Epsilon end	0.1	Small exploration always
Epsilon decay steps	100,000	~10% of total training
Discount factor (γ)	0.99	Standard for Atari
Frame stack	4	Capture motion information

Evaluation Criteria

Your agent will be evaluated on:

Correctness (30%):
- DQN algorithm implemented correctly
- Experience replay and target network working
- Training converges without errors
Performance (25%):
- Agent reaches target score within 1M steps
- Stable performance (consistent across evaluation runs)
- Demonstrates learned strategy (not random)
Code Quality (15%):
- Modular design (separate files for agent, buffer, training)
- Clear variable names and comments
- Follows Python best practices (PEP 8)
Analysis (20%):
- TensorBoard logs show clear learning progress
- README discusses hyperparameter choices
- Video demonstrates agent's learned strategy
Documentation (10%):
- Complete setup instructions
- Description of implementation choices
- Results analysis and lessons learned

Resources

Documentation

Gymnasium Atari - Atari environment documentation
PyTorch Tutorial - DQN implementation guide
TensorBoard - Logging and visualization

Research Papers

DQN (Mnih et al., 2015): Human-level control through deep reinforcement learning
Double DQN (van Hasselt et al., 2016): Deep Reinforcement Learning with Double Q-learning
Dueling DQN (Wang et al., 2016): Dueling Network Architectures
Prioritized Experience Replay (Schaul et al., 2016): Prioritized Experience Replay

Code Examples

Stable-Baselines3 DQN
DLR-RM/stable-baselines3
View on GitHub
CleanRL DQN
vwxyzjn/cleanrl
View on GitHub

Submission Guidelines

Required Deliverables

Code Repository (GitHub/GitLab)
- All source code files
- README.md with setup instructions
- requirements.txt with exact versions
Trained Model
- Best checkpoint file (.pth)
- Training logs (TensorBoard format)
Demo Video (3-5 minutes)
- Screen recording of trained agent playing 5 episodes
- Narration explaining agent's strategy
- Performance metrics displayed
Technical Report (2-3 pages)
- Implementation choices and justification
- Hyperparameter tuning process and results
- Performance analysis (learning curves, final scores)
- Challenges encountered and solutions
- Future improvements

Submission Format

GitHub Repository Structure:

project-01-dqn-game-master/
├── README.md
├── requirements.txt
├── dqn_agent.py
├── replay_buffer.py
├── train.py
├── evaluate.py
├── models/
│   └── dqn_best.pth
├── logs/
│   └── tensorboard/
├── videos/
│   └── agent_demo.mp4
└── report.pdf

Submission Link: [Google Form/LMS Upload Link]

Deadline: 2 weeks from project start date

Tips for Success

Training Tips

Start Simple: Train on Pong first (easiest game) before attempting harder games
Monitor Early: Check TensorBoard logs after 10K steps - should see some learning
Be Patient: DQN can take 500K-1M steps to converge on complex games
Use GPU: Training will be 10-20x faster on GPU (use Google Colab if needed)

Debugging Checklist

Agent's Q-values are increasing over time (indicates learning)
Replay buffer is filling up before training starts
Target network is updating periodically (check logs)
Epsilon is decaying correctly (should reach 0.1 eventually)
Loss is decreasing (but may be noisy)

Common Pitfalls

❌ Forgetting frame stacking: Agent needs 4 frames to perceive motion ❌ Too large learning rate: Causes unstable training and divergence ❌ Too small replay buffer: Reduces diversity of training data ❌ Not clipping rewards: Atari games have large reward variance ❌ Training too short: Many games ``need >500``K steps to learn

Portfolio Presentation

After completing this project, create a portfolio piece:

Demo Website/GitHub README:

Title: "Deep Q-Learning for Atari Game Mastery"
GIF or video of trained agent playing
Performance graph (learning curve)
Technical highlights: "Implemented experience replay and target networks"
Results: "Achieved superhuman performance on [game name]"

LinkedIn/Resume Bullet:

"Implemented Deep Q-Network (DQN) agent achieving superhuman performance on Atari games, demonstrating expertise in deep reinforcement learning, experience replay, and neural network optimization."

Next Steps

After completing this project:

Try other Atari games: Test generalization of your implementation
Implement improvements: Double DQN, Dueling DQN, Rainbow DQN
Move to Project 2: Apply RL to continuous control (PPO for robotics)
Read advanced papers: Prioritized replay, distributional RL, intrinsic motivation

Good luck! You're implementing the algorithm that started the deep RL revolution. This project will give you production-ready experience in deep reinforcement learning and prepare you for real-world RL applications.

Related Projects:

Project 2 - Autonomous Robot Navigation -> (PPO for continuous control)

Project 1 of 7

Project 1: DQN Game Master

Apply your knowledge to build something amazing!

Project One: DQN Game Master

Duration: 2 weeks Points: 100 Prerequisites: Complete Lessons 1-3 (RL Fundamentals, Q-Learning, DQN) Difficulty: Intermediate

Project Overview

In this project, you'll train a Deep Q-Network (DQN) agent to master an Atari game from raw pixel observations. You'll implement the complete DQN algorithm including experience replay, target networks, and epsilon-greedy exploration. Your agent will learn to play at human-level or beyond, demonstrating the power of deep reinforcement learning.

Why This Matters: DQN was the breakthrough algorithm that showed deep learning could solve complex RL tasks directly from high-dimensional inputs (images). This project replicates the DeepMind research that started the modern deep RL revolution.

What You'll Build:

Complete DQN agent with experience replay buffer
Convolutional neural network for Q-value approximation
Training pipeline with TensorBoard logging
Agent evaluation and visualization system
Portfolio-ready demo video of trained agent

Learning Objectives

By completing this project, you will:

Implement the full DQN algorithm from the DeepMind Nature paper
Design a convolutional neural network architecture for Atari games
Apply experience replay and target networks to stabilize training
Optimize hyperparameters (learning rate, epsilon schedule, replay size)
Evaluate agent performance using episodic returns and training curves
Visualize learned Q-values and agent gameplay videos

Requirements

Functional Requirements

Your DQN agent must:

Train on Atari game: Choose from Pong, Breakout, SpaceInvaders, or Seaquest
Achieve target performance: Reach the performance threshold within reasonable training time:
- Pong: Average ``reward >= 15`` (human: ~9)
- Breakout: Average ``reward >= 300`` (human: ~30)
- SpaceInvaders: Average ``reward >= 1000`` (human: ~600)
- Seaquest: Average ``reward >= 500`` (human: ~200)
Use raw pixels: Take 84x84 grayscale frames as input (not hand-crafted features)
Implement experience replay: Replay buffer with at least 10,000 transitions
Use target network: Separate target Q-network updated every N steps
Epsilon-greedy exploration: Decay epsilon from 1.0 to 0.1 over training
Produce demo video: Record trained agent playing 5 complete episodes

Technical Requirements

Your implementation must include:

DQN Model: Convolutional neural network with 3 conv layers + 2 fully connected layers
Replay Buffer: Efficient storage and sampling of (s, a, r, s', done) tuples
Training Loop: Updates Q-network using batches from replay buffer
Logging: TensorBoard logs for loss, reward, epsilon, Q-values
Checkpointing: Save model weights every 10K steps
Evaluation: Separate evaluation runs (no exploration noise) every 5K steps

Code Structure

graphql

project-01-dqn-game-master/
├── README.md                  # Project documentation
├── requirements.txt           # Python dependencies
├── dqn_agent.py              # DQN agent implementation
├── replay_buffer.py          # Experience replay buffer
├── train.py                  # Training script
├── evaluate.py               # Evaluation and video generation
├── models/                   # Saved model checkpoints
├── logs/                     # TensorBoard logs
└── videos/                   # Recorded gameplay videos

Grading Rubric

Criterion	Points	Description
DQN Implementation	30	Correct DQN algorithm with experience replay and target network
Performance	25	Agent reaches target performance threshold
Code Quality	15	Clean, modular, well-documented code
Hyperparameter Tuning	10	Evidence of systematic tuning (log results)
Visualization	10	TensorBoard logs and demo video
Documentation	10	Clear README with setup instructions and results
Total	100

Bonus Points (+10 each):

Implement Double DQN (reduces overestimation bias)
Implement Dueling DQN (separate value and advantage streams)
Implement Prioritized Experience Replay (sample important transitions more often)
Train agent on multiple games and compare results

Milestones

Week One: Implementation and Initial Training

Day 1-2: Environment Setup

Install gymnasium[atari] and dependencies
Verify Atari game loads correctly
Test frame preprocessing (grayscale, resize, stack)

Day 3-4: DQN Components

Implement replay buffer class
Implement DQN model architecture (CNN)
Implement epsilon-greedy action selection

Day 5-7: Training Pipeline

Implement training loop
Add TensorBoard logging
Start initial training run (monitor for bugs)

Deliverable: Working training script that logs to TensorBoard

Week 2: Optimization and Evaluation

Day 8-10: Hyperparameter Tuning

Tune learning rate (try 1e-4, 2.5e-4, 5e-4)
Tune target network update frequency (1000, 5000, 10000 steps)
Tune batch size (32, 64, 128)

Day 11-12: Extended Training

Train for at least 1 million steps
Monitor convergence
Save best checkpoint

Day 13-14: Evaluation and Documentation

Evaluate trained agent (10 episodes, no exploration)
Record demo video
Write comprehensive README
Prepare presentation

Deliverable: Trained agent, demo video, complete documentation

Implementation Guide

DQN Model Architecture

python

import torch
import torch.nn as nn

class DQN(nn.Module):
    def __init__(self, num_actions):
        super().__init__()
        # Input: 4 stacked frames of 84x84 grayscale images
        self.conv1 = nn.Conv2d(4, 32, kernel_size=8, stride=4)  # Output: 32x20x20
        self.conv2 = nn.Conv2d(32, 64, kernel_size=4, stride=2)  # Output: 64x9x9
        self.conv3 = nn.Conv2d(64, 64, kernel_size=3, stride=1)  # Output: 64x7x7

        # Fully connected layers
        self.fc1 = nn.Linear(64 * 7 * 7, 512)
        self.fc2 = nn.Linear(512, num_actions)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.relu(self.conv2(x))
        x = torch.relu(self.conv3(x))
        x = x.view(x.size(0), -1)  # Flatten
        x = torch.relu(self.fc1(x))
        q_values = self.fc2(x)
        return q_values

Experience Replay Buffer

python

import numpy as np
import random
from collections import deque

class ReplayBuffer:
    def __init__(self, capacity=100000):
        self.buffer = deque(maxlen=capacity)

    def push(self, state, action, reward, next_state, done):
        self.buffer.append((state, action, reward, next_state, done))

    def sample(self, batch_size):
        batch = random.sample(self.buffer, batch_size)
        states, actions, rewards, next_states, dones = zip(*batch)
        return (
            np.array(states),
            np.array(actions),
            np.array(rewards),
            np.array(next_states),
            np.array(dones)
        )

    def __len__(self):
        return len(self.buffer)

Training Loop Skeleton

python

def train_dqn(env, agent, num_steps=1_000_000):
    state = env.reset()
    episode_reward = 0
    episode_count = 0

    for step in range(num_steps):
        # Select action with epsilon-greedy
        epsilon = max(0.1, 1.0 - step / 100000)  # Linear decay
        action = agent.select_action(state, epsilon)

        # Execute action
        next_state, reward, done, info = env.step(action)
        episode_reward += reward

        # Store transition
        agent.replay_buffer.push(state, action, reward, next_state, done)

        # Train on batch
        if len(agent.replay_buffer) > agent.batch_size:
            loss = agent.update()

        # Update target network
        if step % agent.target_update_freq == 0:
            agent.update_target_network()

        # Log metrics
        if step % 1000 == 0:
            writer.add_scalar('Loss', loss, step)
            writer.add_scalar('Epsilon', epsilon, step)

        # Handle episode end
        if done:
            writer.add_scalar('Reward', episode_reward, episode_count)
            episode_reward = 0
            episode_count += 1
            state = env.reset()
        else:
            state = next_state

Hyperparameter Recommendations

Hyperparameter	Recommended Value	Notes
Learning rate	2.5e-4	RMSprop optimizer
Batch size	32	Smaller is more sample efficient
Replay buffer size	100,000	Balance memory and diversity
Target update frequency	10,000 steps	Too frequent -> unstable
Epsilon start	One.0	Full exploration initially
Epsilon end	0.1	Small exploration always
Epsilon decay steps	100,000	~10% of total training
Discount factor (γ)	0.99	Standard for Atari
Frame stack	4	Capture motion information

Evaluation Criteria

Your agent will be evaluated on:

Correctness (30%):
- DQN algorithm implemented correctly
- Experience replay and target network working
- Training converges without errors
Performance (25%):
- Agent reaches target score within 1M steps
- Stable performance (consistent across evaluation runs)
- Demonstrates learned strategy (not random)
Code Quality (15%):
- Modular design (separate files for agent, buffer, training)
- Clear variable names and comments
- Follows Python best practices (PEP 8)
Analysis (20%):
- TensorBoard logs show clear learning progress
- README discusses hyperparameter choices
- Video demonstrates agent's learned strategy
Documentation (10%):
- Complete setup instructions
- Description of implementation choices
- Results analysis and lessons learned

Resources

Documentation

Gymnasium Atari - Atari environment documentation
PyTorch Tutorial - DQN implementation guide
TensorBoard - Logging and visualization

Research Papers

DQN (Mnih et al., 2015): Human-level control through deep reinforcement learning
Double DQN (van Hasselt et al., 2016): Deep Reinforcement Learning with Double Q-learning
Dueling DQN (Wang et al., 2016): Dueling Network Architectures
Prioritized Experience Replay (Schaul et al., 2016): Prioritized Experience Replay

Code Examples

Stable-Baselines3 DQN
DLR-RM/stable-baselines3
View on GitHub
CleanRL DQN
vwxyzjn/cleanrl
View on GitHub

Submission Guidelines

Required Deliverables

Code Repository (GitHub/GitLab)
- All source code files
- README.md with setup instructions
- requirements.txt with exact versions
Trained Model
- Best checkpoint file (.pth)
- Training logs (TensorBoard format)
Demo Video (3-5 minutes)
- Screen recording of trained agent playing 5 episodes
- Narration explaining agent's strategy
- Performance metrics displayed
Technical Report (2-3 pages)
- Implementation choices and justification
- Hyperparameter tuning process and results
- Performance analysis (learning curves, final scores)
- Challenges encountered and solutions
- Future improvements

Submission Format

GitHub Repository Structure:

project-01-dqn-game-master/
├── README.md
├── requirements.txt
├── dqn_agent.py
├── replay_buffer.py
├── train.py
├── evaluate.py
├── models/
│   └── dqn_best.pth
├── logs/
│   └── tensorboard/
├── videos/
│   └── agent_demo.mp4
└── report.pdf

Submission Link: [Google Form/LMS Upload Link]

Deadline: 2 weeks from project start date

Tips for Success

Training Tips

Start Simple: Train on Pong first (easiest game) before attempting harder games
Monitor Early: Check TensorBoard logs after 10K steps - should see some learning
Be Patient: DQN can take 500K-1M steps to converge on complex games
Use GPU: Training will be 10-20x faster on GPU (use Google Colab if needed)

Debugging Checklist

Agent's Q-values are increasing over time (indicates learning)
Replay buffer is filling up before training starts
Target network is updating periodically (check logs)
Epsilon is decaying correctly (should reach 0.1 eventually)
Loss is decreasing (but may be noisy)

Common Pitfalls

❌ Forgetting frame stacking: Agent needs 4 frames to perceive motion ❌ Too large learning rate: Causes unstable training and divergence ❌ Too small replay buffer: Reduces diversity of training data ❌ Not clipping rewards: Atari games have large reward variance ❌ Training too short: Many games ``need >500``K steps to learn

Portfolio Presentation

After completing this project, create a portfolio piece:

Demo Website/GitHub README:

Title: "Deep Q-Learning for Atari Game Mastery"
GIF or video of trained agent playing
Performance graph (learning curve)
Technical highlights: "Implemented experience replay and target networks"
Results: "Achieved superhuman performance on [game name]"

LinkedIn/Resume Bullet:

"Implemented Deep Q-Network (DQN) agent achieving superhuman performance on Atari games, demonstrating expertise in deep reinforcement learning, experience replay, and neural network optimization."

Next Steps

After completing this project:

Try other Atari games: Test generalization of your implementation
Implement improvements: Double DQN, Dueling DQN, Rainbow DQN
Move to Project 2: Apply RL to continuous control (PPO for robotics)
Read advanced papers: Prioritized replay, distributional RL, intrinsic motivation

Good luck! You're implementing the algorithm that started the deep RL revolution. This project will give you production-ready experience in deep reinforcement learning and prepare you for real-world RL applications.

Related Projects:

Project 2 - Autonomous Robot Navigation -> (PPO for continuous control)

Project 1: DQN Game Master

Stable-Baselines3 DQN

CleanRL DQN

Project 1: DQN Game Master

Stable-Baselines3 DQN

CleanRL DQN