ℹ️ Definition Generative Adversarial Networks (GANs) are generative models that train two neural networks in competition: a generator that creates fake samples and a discriminator that distinguishes real from fake, resulting in highly realistic sample generation through adversarial training.
By the end of this lesson, you will:
In Lesson 10, we learned VAEs - generative models with stable training but blurry outputs. GANs take a radically different approach: instead of maximizing likelihood, two networks compete in an adversarial game.
Result: State-of-the-art sample quality, but challenging to train!
GAN Applications:
Generator (G):
Discriminator (D):
Training: Generator and discriminator improve simultaneously through competition.

1. Discriminator trains to distinguish real from fake
- Real samples from training data → Label 1 (real)
- Fake samples from generator → Label 0 (fake)
2. Generator trains to fool discriminator
- Generate fake samples
- Want discriminator to output 1 (thinks they're real)
3. Repeat until Nash equilibrium
Round 1:
Generator: Creates obvious fakes
Discriminator: Easily spots fakes (100% accuracy)
Round 100:
Generator: Creates better fakes
Discriminator: Still catching most fakes (80% accuracy)
Round 10,000:
Generator: Creates photorealistic fakes
Discriminator: Can't reliably distinguish (50% accuracy = random guessing)
↑
Nash equilibrium: Generator wins!
GAN objective:
min_G max_D V(D, G) = E_x[log D(x)] + E_z[log(1 - D(G(z)))]
Interpretation:
Discriminator maximizes:
Generator minimizes:
Loss function:
L_D = -[E_x[log D(x)] + E_z[log(1 - D(G(z)))]]
In practice:
# Real samples
real_loss = F.binary_cross_entropy(D(real_images), torch.ones_like(D(real_images)))
# Fake samples
fake_images = G(noise)
fake_loss = F.binary_cross_entropy(D(fake_images.detach()), torch.zeros_like(D(fake_images)))
d_loss = real_loss + fake_loss
Key: Use .detach() to prevent gradients flowing to generator during discriminator update.
Original objective:
L_G = E_z[log(1 - D(G(z)))]
Problem: Vanishing gradients when D is confident (D(G(z)) ~= 0)
Better objective (non-saturating):
L_G = -E_z[log D(G(z))]
Implementation:
fake_images = G(noise)
g_loss = F.binary_cross_entropy(D(fake_images), torch.ones_like(D(fake_images)))
Interpretation: Train generator to maximize probability that discriminator thinks fakes are real.
Input: Random noise z (latent code) Output: Generated sample G(z)
MNIST Generator Example:
class Generator(nn.Module):
def __init__(self, latent_dim=100):
super().__init__()
self.model = nn.Sequential(
nn.Linear(latent_dim, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, 1024),
nn.LeakyReLU(0.2),
nn.Linear(1024, 784), # 28x28 = 784
nn.Tanh() # Output in [-1, 1]
)
def forward(self, z):
img = self.model(z)
return img.view(-1, 1, 28, 28)
Input: Sample x (real or fake) Output: Probability D(x) that x is real
MNIST Discriminator Example:
class Discriminator(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Linear(784, 512),
nn.LeakyReLU(0.2),
nn.Dropout(0.3),
nn.Linear(512, 256),
nn.LeakyReLU(0.2),
nn.Dropout(0.3),
nn.Linear(256, 1),
nn.Sigmoid() # Output probability
)
def forward(self, img):
img_flat = img.view(img.size(0), -1)
validity = self.model(img_flat)
return validity
import torch
import torch.nn as nn
import torch.optim as optim
# Initialize models
latent_dim = 100
generator = Generator(latent_dim)
discriminator = Discriminator()
# Optimizers
g_optimizer = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
d_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))
# Loss function
criterion = nn.BCELoss()
# Training loop
for epoch in range(num_epochs):
for i, (real_images, _) in enumerate(dataloader):
batch_size = real_images.size(0)
# Real and fake labels
real_labels = torch.ones(batch_size, 1)
fake_labels = torch.zeros(batch_size, 1)
# ---------------------
# Train Discriminator
# ---------------------
d_optimizer.zero_grad()
# Real images
real_outputs = discriminator(real_images)
d_real_loss = criterion(real_outputs, real_labels)
# Fake images
z = torch.randn(batch_size, latent_dim)
fake_images = generator(z)
fake_outputs = discriminator(fake_images.detach()) # Detach!
d_fake_loss = criterion(fake_outputs, fake_labels)
# Total discriminator loss
d_loss = d_real_loss + d_fake_loss
d_loss.backward()
d_optimizer.step()
# -----------------
# Train Generator
# -----------------
g_optimizer.zero_grad()
# Generate fake images
z = torch.randn(batch_size, latent_dim)
fake_images = generator(z)
fake_outputs = discriminator(fake_images)
# Generator loss (fool discriminator)
g_loss = criterion(fake_outputs, real_labels) # Want D(G(z)) = 1
g_loss.backward()
g_optimizer.step()
print(f"Epoch [{epoch}/{num_epochs}], d_loss: {d_loss.item():.4f}, g_loss: {g_loss.item():.4f}")

Problem: Generator produces limited variety of samples
Symptoms:
Causes:
Solutions:
Problem: Losses oscillate wildly, training diverges
Symptoms:
Causes:
Solutions:
Problem: Generator stops learning
Symptoms:
Causes:
Solutions:
Problem: Training doesn't converge to Nash equilibrium
Symptoms:
Causes:
Solutions:
Problem: Hard labels (0, 1) can cause overconfidence
Solution: Smooth labels
real_labels = torch.ones(batch_size, 1) * 0.9 # Instead of 1.0
fake_labels = torch.zeros(batch_size, 1) + 0.1 # Instead of 0.0
Add noise to discriminator inputs:
# Add Gaussian noise to real and fake images
noise = torch.randn_like(real_images) * noise_std
noisy_real = real_images + noise
noisy_fake = fake_images + noise
Decay noise over training:
noise_std = initial_noise * (decay_rate ** epoch)
Only smooth real labels:
real_labels = torch.rand(batch_size, 1) * 0.1 + 0.9 # [0.9, 1.0]
fake_labels = torch.zeros(batch_size, 1) # Keep at 0
Match statistics of intermediate features:
# Extract features from discriminator
real_features = discriminator.get_features(real_images)
fake_features = discriminator.get_features(fake_images)
# Generator loss: match feature statistics
g_loss = F.mse_loss(fake_features.mean(0), real_features.mean(0))
Penalize deviation from historical parameters:
g_loss_gan = criterion(fake_outputs, real_labels)
g_loss_history = F.mse_loss(generator.parameters(), historical_params)
g_loss = g_loss_gan + lambda_history * g_loss_history
Train discriminator on mix of current and past fake samples:
# Store past fake samples
if len(replay_buffer) < buffer_size:
replay_buffer.append(fake_images.detach())
else:
replay_buffer[random.randint(0, buffer_size-1)] = fake_images.detach()
# Train discriminator on mix
past_fakes = random.sample(replay_buffer, batch_size // 2)
current_fakes = fake_images[:batch_size // 2]
mixed_fakes = torch.cat([current_fakes, past_fakes])
From DCGAN paper (Radford et al., 2015):

class DCGANGenerator(nn.Module):
def __init__(self, latent_dim=100, channels=1):
super().__init__()
self.model = nn.Sequential(
# Input: latent_dim x 1 x 1
nn.ConvTranspose2d(latent_dim, 512, 4, 1, 0, bias=False),
nn.BatchNorm2d(512),
nn.ReLU(True),
# State: 512 x 4 x 4
nn.ConvTranspose2d(512, 256, 4, 2, 1, bias=False),
nn.BatchNorm2d(256),
nn.ReLU(True),
# State: 256 x 8 x 8
nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),
nn.BatchNorm2d(128),
nn.ReLU(True),
# State: 128 x 16 x 16
nn.ConvTranspose2d(128, channels, 4, 2, 1, bias=False),
nn.Tanh()
# Output: channels x 32 x 32
)
def forward(self, z):
return self.model(z.view(-1, z.size(1), 1, 1))
class DCGANDiscriminator(nn.Module):
def __init__(self, channels=1):
super().__init__()
self.model = nn.Sequential(
# Input: channels x 32 x 32
nn.Conv2d(channels, 128, 4, 2, 1, bias=False),
nn.LeakyReLU(0.2, inplace=True),
# State: 128 x 16 x 16
nn.Conv2d(128, 256, 4, 2, 1, bias=False),
nn.BatchNorm2d(256),
nn.LeakyReLU(0.2, inplace=True),
# State: 256 x 8 x 8
nn.Conv2d(256, 512, 4, 2, 1, bias=False),
nn.BatchNorm2d(512),
nn.LeakyReLU(0.2, inplace=True),
# State: 512 x 4 x 4
nn.Conv2d(512, 1, 4, 1, 0, bias=False),
nn.Sigmoid()
# Output: 1 x 1 x 1
)
def forward(self, img):
return self.model(img).view(-1, 1)
Problem: Standard GAN can't control what to generate
Solution: Condition both networks on labels/attributes
Generator:
G(z, y) → x
Discriminator:
D(x, y) → probability x is real given label y
class ConditionalGenerator(nn.Module):
def __init__(self, latent_dim=100, n_classes=10):
super().__init__()
self.label_emb = nn.Embedding(n_classes, n_classes)
self.model = nn.Sequential(
nn.Linear(latent_dim + n_classes, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, 1024),
nn.LeakyReLU(0.2),
nn.Linear(1024, 784),
nn.Tanh()
)
def forward(self, z, labels):
# Concatenate noise and label embedding
label_input = self.label_emb(labels)
gen_input = torch.cat([z, label_input], dim=1)
img = self.model(gen_input)
return img.view(-1, 1, 28, 28)
Generate specific digits:
z = torch.randn(10, latent_dim)
labels = torch.LongTensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # One of each digit
generated_digits = generator(z, labels)
Measures: Quality and diversity
Formula:
IS = exp(E_x[KL(p(y|x) || p(y))])
Interpretation:
Measures: Similarity to real data distribution
Formula:
FID = ||μ_r - μ_g||² + Tr(Σ_r + Σ_g - 2√(Σ_r Σ_g))
Where μ, Σ are mean and covariance of Inception features
Interpretation:
Measures: Mode coverage (diversity)
Detects: Mode collapse
| Aspect | GANs | VAEs |
|---|---|---|
| Sample Quality | Sharper, more realistic | Blurry, averaged |
| Training Stability | Unstable, requires tricks | Stable, principled |
| Likelihood | No explicit likelihood | Explicit (ELBO) |
| Mode Coverage | Prone to mode collapse | Covers all modes |
| Latent Space | Less interpretable | Smooth, interpretable |
| Speed | Fast sampling | Fast sampling |
| Use Case | Image generation | Compression, anomaly detection |