Apply your knowledge to build something amazing!
Duration: 2 weeks Points: 100 Prerequisites: Complete Lessons 1-3 (RL Fundamentals, Q-Learning, DQN) Difficulty: Intermediate
In this project, you'll train a Deep Q-Network (DQN) agent to master an Atari game from raw pixel observations. You'll implement the complete DQN algorithm including experience replay, target networks, and epsilon-greedy exploration. Your agent will learn to play at human-level or beyond, demonstrating the power of deep reinforcement learning.
Why This Matters: DQN was the breakthrough algorithm that showed deep learning could solve complex RL tasks directly from high-dimensional inputs (images). This project replicates the DeepMind research that started the modern deep RL revolution.
What You'll Build:
By completing this project, you will:
Your DQN agent must:
Your implementation must include:
project-01-dqn-game-master/
├── README.md # Project documentation
├── requirements.txt # Python dependencies
├── dqn_agent.py # DQN agent implementation
├── replay_buffer.py # Experience replay buffer
├── train.py # Training script
├── evaluate.py # Evaluation and video generation
├── models/ # Saved model checkpoints
├── logs/ # TensorBoard logs
└── videos/ # Recorded gameplay videos
| Criterion | Points | Description |
|---|---|---|
| DQN Implementation | 30 | Correct DQN algorithm with experience replay and target network |
| Performance | 25 | Agent reaches target performance threshold |
| Code Quality | 15 | Clean, modular, well-documented code |
| Hyperparameter Tuning | 10 | Evidence of systematic tuning (log results) |
| Visualization | 10 | TensorBoard logs and demo video |
| Documentation | 10 | Clear README with setup instructions and results |
| Total | 100 |
Bonus Points (+10 each):
Day 1-2: Environment Setup
Day 3-4: DQN Components
Day 5-7: Training Pipeline
Deliverable: Working training script that logs to TensorBoard
Day 8-10: Hyperparameter Tuning
Day 11-12: Extended Training
Day 13-14: Evaluation and Documentation
Deliverable: Trained agent, demo video, complete documentation
import torch
import torch.nn as nn
class DQN(nn.Module):
def __init__(self, num_actions):
super().__init__()
# Input: 4 stacked frames of 84x84 grayscale images
self.conv1 = nn.Conv2d(4, 32, kernel_size=8, stride=4) # Output: 32x20x20
self.conv2 = nn.Conv2d(32, 64, kernel_size=4, stride=2) # Output: 64x9x9
self.conv3 = nn.Conv2d(64, 64, kernel_size=3, stride=1) # Output: 64x7x7
# Fully connected layers
self.fc1 = nn.Linear(64 * 7 * 7, 512)
self.fc2 = nn.Linear(512, num_actions)
def forward(self, x):
x = torch.relu(self.conv1(x))
x = torch.relu(self.conv2(x))
x = torch.relu(self.conv3(x))
x = x.view(x.size(0), -1) # Flatten
x = torch.relu(self.fc1(x))
q_values = self.fc2(x)
return q_values
import numpy as np
import random
from collections import deque
class ReplayBuffer:
def __init__(self, capacity=100000):
self.buffer = deque(maxlen=capacity)
def push(self, state, action, reward, next_state, done):
self.buffer.append((state, action, reward, next_state, done))
def sample(self, batch_size):
batch = random.sample(self.buffer, batch_size)
states, actions, rewards, next_states, dones = zip(*batch)
return (
np.array(states),
np.array(actions),
np.array(rewards),
np.array(next_states),
np.array(dones)
)
def __len__(self):
return len(self.buffer)
def train_dqn(env, agent, num_steps=1_000_000):
state = env.reset()
episode_reward = 0
episode_count = 0
for step in range(num_steps):
# Select action with epsilon-greedy
epsilon = max(0.1, 1.0 - step / 100000) # Linear decay
action = agent.select_action(state, epsilon)
# Execute action
next_state, reward, done, info = env.step(action)
episode_reward += reward
# Store transition
agent.replay_buffer.push(state, action, reward, next_state, done)
# Train on batch
if len(agent.replay_buffer) > agent.batch_size:
loss = agent.update()
# Update target network
if step % agent.target_update_freq == 0:
agent.update_target_network()
# Log metrics
if step % 1000 == 0:
writer.add_scalar('Loss', loss, step)
writer.add_scalar('Epsilon', epsilon, step)
# Handle episode end
if done:
writer.add_scalar('Reward', episode_reward, episode_count)
episode_reward = 0
episode_count += 1
state = env.reset()
else:
state = next_state
| Hyperparameter | Recommended Value | Notes |
|---|---|---|
| Learning rate | 2.5e-4 | RMSprop optimizer |
| Batch size | 32 | Smaller is more sample efficient |
| Replay buffer size | 100,000 | Balance memory and diversity |
| Target update frequency | 10,000 steps | Too frequent -> unstable |
| Epsilon start | One.0 | Full exploration initially |
| Epsilon end | 0.1 | Small exploration always |
| Epsilon decay steps | 100,000 | ~10% of total training |
| Discount factor (γ) | 0.99 | Standard for Atari |
| Frame stack | 4 | Capture motion information |
Your agent will be evaluated on:
Correctness (30%):
Performance (25%):
Code Quality (15%):
Analysis (20%):
Documentation (10%):
DLR-RM/stable-baselines3vwxyzjn/cleanrlCode Repository (GitHub/GitLab)
Trained Model
Demo Video (3-5 minutes)
Technical Report (2-3 pages)
GitHub Repository Structure:
project-01-dqn-game-master/
├── README.md
├── requirements.txt
├── dqn_agent.py
├── replay_buffer.py
├── train.py
├── evaluate.py
├── models/
│ └── dqn_best.pth
├── logs/
│ └── tensorboard/
├── videos/
│ └── agent_demo.mp4
└── report.pdf
Submission Link: [Google Form/LMS Upload Link]
Deadline: 2 weeks from project start date
❌ Forgetting frame stacking: Agent needs 4 frames to perceive motion ❌ Too large learning rate: Causes unstable training and divergence ❌ Too small replay buffer: Reduces diversity of training data ❌ Not clipping rewards: Atari games have large reward variance ❌ Training too short: Many games ``need >500``K steps to learn
After completing this project, create a portfolio piece:
Demo Website/GitHub README:
LinkedIn/Resume Bullet:
"Implemented Deep Q-Network (DQN) agent achieving superhuman performance on Atari games, demonstrating expertise in deep reinforcement learning, experience replay, and neural network optimization."
After completing this project:
Good luck! You're implementing the algorithm that started the deep RL revolution. This project will give you production-ready experience in deep reinforcement learning and prepare you for real-world RL applications.
Related Projects: