AI25: Modern Artificial Intelligence

Master advanced AI paradigms through reinforcement learning and generative AI. Build intelligent agents, train game-playing AI, create generative models, and understand how RL and GenAI converge through RLHF.

Learning Outcomes

Build reinforcement learning agents using PyTorch and Gymnasium
Implement generative models (VAEs, GANs, Diffusion, Transformers)
Understand RLHF and modern AI alignment techniques
Deploy AI agents with safety considerations

Prerequisites

Required: AI2 completion (or equivalent ML/CV background)
Python proficiency with PyTorch basics
Understanding of neural networks and optimization
Google account for Google Colab (Pro recommended)

Recommended For

Age Group: 14-18 years Prior Experience: AI2 or ML fundamentals

AI25: Modern Artificial Intelligence - Session Logistics

Session	Topic	Objectives	Notes/File Link	Activities Link	Projects	Exit Ticket	Submission Link
1	Introduction to Reinforcement Learning	Agent-environment interaction, MDP basics	Concept 01: Introduction to Reinforcement Learning	Activity 01: Introduction to Reinforcement Learning	-	ET-1	Submit Activity
2	Q-Learning and Value Functions	Tabular Q-Learning, epsilon-greedy exploration	Concept 02: Q-Learning and Value Functions	Activity 02: Q-Learning and Value Functions	-	ET-2	Submit Activity
3	Deep Q-Networks (DQN)	Function approximation, experience replay	Concept 03: Deep Q-Networks (DQN)	Activity 03: Deep Q-Networks (DQN)	-	ET-3	Submit Activity
4	Project One: DQN Game Master	Train DQN agent to master Atari game	-	-	Project One: DQN Game Master	-	Submit Project
5	Policy Gradient Methods	Policy-based RL, REINFORCE algorithm	Concept 04: Policy Gradient Methods	Activity 04: Policy Gradient Methods	-	ET-4	Submit Activity
6	Actor-Critic Methods	Actor-critic architecture, advantage estimation	Concept 05: Actor-Critic Methods	Activity 05: Actor-Critic Methods	-	ET-5	Submit Activity
7	Proximal Policy Optimization (PPO)	Trust region methods, PPO clipping	Concept 06: Proximal Policy Optimization (PPO)	Activity 06: Proximal Policy Optimization (PPO)	-	ET-6	Submit Activity
8	Project 2: Autonomous Robot Navigation	PPO agent for continuous control	-	-	Project 2: Autonomous Robot Navigation	-	Submit Project
9	Multi-Armed Bandits and Exploration	Bandit problems, UCB, contextual bandits	Concept 07: Multi-Armed Bandits and Exploration	Activity 07: Multi-Armed Bandits and Exploration	-	ET-7	Submit Activity
10	RL in Practice - Debugging and Deployment	Debug RL failures, reward shaping	Concept 08: RL in Practice - Debugging and Deployment	Activity 08: RL in Practice - Debugging and Deployment	-	ET-8	Submit Activity
11	Introduction to Generative Models	Generative vs discriminative, latent spaces	Concept 09: Introduction to Generative Models	Activity 09: Introduction to Generative Models	-	ET-9	Submit Activity
12	Variational Autoencoders (VAEs)	Encoder-decoder, reparameterization trick	Concept 10: Variational Autoencoders (VAEs)	Activity 10: Variational Autoencoders (VAEs)	-	ET-10	Submit Activity
13	Projects 3-4: Generative Models Workshop	GAN art generation and VAE latent space exploration	-	-	Project 3: GAN Art Studio + Project 4: Latent Space Explorer	-	Submit Project
14	Generative Adversarial Networks (GANs)	Adversarial training, minimax objective	Concept 11: Generative Adversarial Networks (GANs)	Activity 11: Generative Adversarial Networks (GANs)	-	ET-11	Submit Activity
15	Advanced GAN Architectures	StyleGAN features, conditional GANs, WGAN-GP	Concept 12: Advanced GAN Architectures	Activity 12: Advanced GAN Architectures	-	ET-12	Submit Activity
16	Diffusion Models	Denoising diffusion, U-Net architecture	Concept 13: Diffusion Models	Activity 13: Diffusion Models	-	ET-13	Submit Activity
17	Transformer Architectures for Generation	Self-attention, autoregressive generation	Concept 14: Transformer Architectures for Generation	Activity 14: Transformer Architectures for Generation	-	ET-14	Submit Activity
18	Large Language Models (LLMs) Fundamentals	LLM architecture, prompt engineering	Concept 15: Large Language Models (LLMs) Fundamentals	Activity 15: Large Language Models (LLMs) Fundamentals	-	ET-15	Submit Activity
19	Reinforcement Learning from Human Feedback (RLHF)	3-stage RLHF pipeline, reward model training	Concept 16: Reinforcement Learning from Human Feedback (RLHF)	Activity 16: Reinforcement Learning from Human Feedback (RLHF)	-	ET-16	Submit Activity
20	Project 5: Text Generation with RLHF	Align LLM with human preferences	-	-	Project 5: Text Generation with RLHF	-	Submit Project
21	Multi-Modal AI - Vision and Language	Cross-modal learning, CLIP, text-to-image	Concept 17: Multi-Modal AI - Vision and Language	Activity 17: Multi-Modal AI - Vision and Language	-	ET-17	Submit Activity
22	Project 6: Multi-Modal Content Generator	Text-to-image and image-to-text pipelines	-	-	Project 6: Multi-Modal Content Generator	-	Submit Project
23	The Future of AI - Integration and Ethics	RL + GenAI integration, AI safety	Concept 18: The Future of AI - Integration and Ethics	Activity 18: The Future of AI - Integration and Ethics	-	ET-18	Submit Activity
24	Project 7: Capstone - AI Agent Ecosystem	Integrated AI system (student-designed)	-	-	Project 7: Capstone - AI Agent Ecosystem	-	Submit Project

Resources

Course Structure

Module 1: Reinforcement Learning (Sessions 1-10)
Module 2: Generative AI (Sessions 11-18)
Module 3: Convergence - RLHF & Modern AI (Sessions 19-24)
Projects: 7 portfolio-building projects
Duration: 12-16 weeks at 8-10 hours per week