Advanced GAN Architectures - Discovery Challenge

🎯 Learning Objectives

By completing this activity, you will:

Understand conditional GANs (cGAN) for class-controlled generation
Implement class embeddings and label conditioning mechanisms
Apply Wasserstein GAN with Gradient Penalty (WGAN-GP) for stable training
Master gradient penalty calculation to enforce Lipschitz constraint
Explore style mixing techniques from StyleGAN architecture
Evaluate class-conditional image generation quality on CIFAR-10

🚀 Getting Started (See Results in 30 Seconds!)

Open in Google Colab: Upload this notebook to Google Colab
Enable GPU: Runtime -> Change runtime type -> GPU (T4)
Run All Cells: Click Runtime -> Run all (or press Ctrl+F9)
Watch the Magic: You'll see:
- ✅ CIFAR-10 dataset loading with class labels
- ✅ Conditional GAN architecture initialized
- ✅ Training loop running on GPU
- ✅ Class-conditional generated images (airplane, car, bird, etc.)

Expected First Run Time: ~90 seconds (GPU initialization + 1 epoch)

🎯 What's Already Working

The template comes with 70% working code:

✅ CIFAR-10 Dataset: 60,000 images with 10 class labels loaded
✅ Conditional GAN Framework: Architecture skeleton for class-conditional generation
✅ Training Loop: Complete training harness with loss tracking
✅ Visualization Tools: Grid display for all 10 classes
✅ GPU Support: Automatic CUDA detection and model transfer
✅ Progress Tracking: Real-time loss plots and sample generation

What Needs Your Work (30%):

⚠️ TODO 1: Implement conditional generator (class embedding + noise concatenation)
⚠️ TODO 2: Implement conditional discriminator (image + label input)
⚠️ TODO 3: Add WGAN-GP loss (Wasserstein distance with gradient penalty)
⚠️ TODO 4: Implement style mixing (StyleGAN feature) - Extension

📋 Tasks to Complete

TODO 1: Implement Conditional Generator (Hard)

Location: Section 3 - "Conditional Generator Architecture"

Current State: Generator takes noise but ignores class labels

Your Task: Make generator class-conditional by:

Adding an embedding layer for class labels (10 classes -> 50-dim embedding)
Concatenating class embedding with noise vector (z + c)
Modifying input dimension to accommodate combined vector

Starter Code Provided:

python

class ConditionalGenerator(nn.Module):
    def __init__(self, latent_dim=100, num_classes=10, embed_dim=50):
        super().__init__()

        # TODO: Add embedding layer for class labels
        # self.label_embedding = nn.Embedding(num_classes, embed_dim)

        # TODO: Update input_dim to latent_dim + embed_dim

        # Generator layers (provided)
        self.model = nn.Sequential(...)

Key Concepts:

Embedding Layer: Maps discrete class IDs to continuous vectors
Concatenation: Combines noise and class info: torch.cat([z, c], dim=1)
Conditional Generation: Generator output depends on both z and class label

Success Criteria:

Embedding layer converts class labels (0-9) to 50-dim vectors
Generator input dimension = 100 (noise) + 50 (class) = 150
Generated images visually match requested class after training
Grid of 10 classes shows distinct visual differences

TODO 2: Implement Conditional Discriminator (Hard)

Location: Section 4 - "Conditional Discriminator Architecture"

Your Task: Modify discriminator to accept both image and class label:

Convert class label to one-hot encoding (10 classes -> 10-dim vector)
Spatially replicate one-hot vector to match image dimensions
Concatenate with image channels (3 RGB + 10 label = 13 channels)

Starter Code Provided:

python

class ConditionalDiscriminator(nn.Module):
    def forward(self, img, labels):
        # TODO: Convert labels to one-hot (batch_size, 10)
        # TODO: Reshape to (batch_size, 10, 1, 1)
        # TODO: Expand to (batch_size, 10, 32, 32) to match image size
        # TODO: Concatenate with img along channel dimension

        # Discriminator layers (provided)
        return self.model(x)

Key Concepts:

One-Hot Encoding: Class 3 -> [0,0,0,1,0,0,0,0,0,0]
Spatial Replication: Expand (B,10,1,1) -> (B,10,32,32) for pixel-wise conditioning
Channel Concatenation: (B,3,32,32) + (B,10,32,32) -> (B,13,32,32)

Success Criteria:

Discriminator accepts image tensor (B,3,32,32) and labels (B,)
One-hot encoding correctly represents class labels
Concatenated input has 13 channels (3 RGB + 10 classes)
Discriminator learns to reject mismatched image-label pairs

TODO 3: Implement WGAN-GP Loss (Very Hard)

Location: Section 5 - "WGAN-GP Loss Functions"

Your Task: Replace standard GAN loss with Wasserstein loss + gradient penalty:

3A. Wasserstein Distance:

python

# Discriminator loss = D(fake) - D(real) + λ * GP
# Generator loss = -D(fake)

3B. Gradient Penalty (enforce 1-Lipschitz constraint):

python

def gradient_penalty(discriminator, real_images, fake_images, labels, device):
    # Step 1: Interpolate between real and fake
    # alpha = torch.rand(batch_size, 1, 1, 1).to(device)
    # interpolated = alpha * real + (1-alpha) * fake

    # Step 2: Compute discriminator output on interpolated
    # Step 3: Compute gradients of output w.r.t. interpolated
    # Step 4: Calculate gradient norm
    # Step 5: Penalty = ((norm - 1)^2).mean()

Key Concepts:

Wasserstein Distance: Measures distribution similarity (more stable than JS divergence)
Lipschitz Constraint: Discriminator gradients must have ``norm <= 1``
Gradient Penalty: Soft constraint enforcing Lipschitz condition (λ=10)
WGAN-GP Benefits: Stable training, meaningful loss curves, no mode collapse

Success Criteria:

Discriminator loss = D(fake) - D(real) + 10 * gradient_penalty
Generator loss = -D(fake)
Gradient penalty calculated on interpolated samples
Gradient norm converges to ~1.0 during training
Training is more stable than vanilla GAN (no mode collapse)

Hints:

Use torch.autograd.grad() to compute gradients
Set create_graph=True to allow backprop through gradient computation
Interpolation: alpha * real + (1-alpha) * fake where alpha ~ Uniform(0,1)

TODO 4: Implement Style Mixing (Very Hard - Extension)

Location: New section you'll create

Your Task: Implement StyleGAN-inspired style mixing:

Generate image with two different noise vectors: z1 and z2
Use z1 for early layers (coarse features: shape, pose)
Switch to z2 at middle layers (fine features: color, texture)
Visualize how mixing point affects output

Requirements:

Modify generator to accept crossover layer index
Generate grid showing: [z1 only, mix at layer 2, mix at layer 4, z2 only]
Use different classes for z1 and z2 (e.g., "dog" structure with "car" texture)

Example Output:

java

Row 1: Dog (z1) → Dog-Car mix (early) → Dog-Car mix (late) → Car (z2)
Row 2: Bird (z1) → Bird-Ship mix → Bird-Ship mix → Ship (z2)

Success Criteria:

Generator supports style mixing with crossover layer parameter
Early mixing preserves coarse structure from z1
Late mixing preserves colors/textures from z2
Visualization shows smooth transition across mixing points
Code is well-commented explaining StyleGAN concepts

🚀 Extension Challenges

Once you've completed all TODOs, try these advanced challenges:

Challenge One: Progressive Growing GAN (Hard)

Implement progressive training from StyleGAN:

Start training at 8x8 resolution
Gradually increase to 16x16, then 32x32
Use fade-in layers during resolution transitions
Compare training stability vs fixed resolution

Benefits: Faster convergence, higher quality images, stable training

Challenge 2: Self-Attention Mechanism (Hard)

Add self-attention layers from SAGAN (Self-Attention GAN):

Implement attention module: Query, Key, Value projections
Add to both generator and discriminator at 16x16 feature maps
Visualize attention maps to see what model focuses on
Compare FID scores vs baseline cGAN

Paper: "Self-Attention Generative Adversarial Networks" (Zhang et al., 2019)

Challenge 3: FID Score Evaluation (Medium)

Implement Fréchet Inception Distance (FID) for objective quality measurement:

Load pretrained InceptionV3 model
Extract features for real and generated images
Calculate FID = ||μ_real - μ_fake||² + Tr(Σ_real + Σ_fake - 2√(Σ_real Σ_fake))
Track FID during training (lower is better)

Target: ``FID < 50`` is good for CIFAR-10, ``FID < 30`` is excellent

Challenge 4: BigGAN Features (Very Hard)

Implement components from BigGAN for high-quality generation:

Orthogonal Regularization: Add penalty term to encourage orthogonal weight matrices
Truncation Trick: Sample z from truncated normal distribution (reduce diversity, increase quality)
Class Embeddings in BatchNorm: Inject class info via conditional batch normalization
Compare quality/diversity tradeoffs

Paper: "Large Scale GAN Training for High Fidelity Natural Image Synthesis" (Brock et al., 2019)

📊 Expected Results

Standard GAN Baseline (for comparison):

Training Stability: Poor (mode collapse after ~20 epochs)
FID Score: 80-120 (high = poor quality)
Class Specificity: None (unconditional generation)

Your Conditional GAN (TODO 1-2):

Training Stability: Moderate (some instability)
FID Score: 50-80
Class Specificity: 70-80% accuracy (generated images match requested class)
Visual Quality: Recognizable CIFAR-10 objects

WGAN-GP (TODO 3):

Training Stability: Excellent (no mode collapse)
FID Score: 35-50 (significant improvement)
Loss Curves: Meaningful (negative = better)
Gradient Norms: Stable around 1.0

With Style Mixing (TODO 4):

Creative Control: Can mix attributes from different classes
Diversity: More varied outputs within same class
Applications: Data augmentation, creative design tools

With Extension Challenges:

Progressive Growing: FID 25-35, 2x faster convergence
Self-Attention: Better spatial coherence, sharper details
BigGAN features: FID 20-30, near state-of-the-art for CIFAR-10

🎓 Success Criteria Checklist

Minimum Requirements (for passing):

Notebook runs without errors in Google Colab with GPU
TODO 1 completed (conditional generator with embeddings)
TODO 2 completed (conditional discriminator with label concatenation)
Generated images show visual correspondence to requested classes
Grid of 10 classes displays distinct visual differences
Code is well-commented explaining architecture choices

Target Grade (for excellent work):

All 3 core TODOs completed (1, 2, 3)
WGAN-GP training is stable (no mode collapse)
Gradient penalty correctly enforces Lipschitz constraint
Generated images have recognizable CIFAR-10 features
Training loss curves are meaningful and tracked
At least 1 extension challenge attempted

Exceptional Work (bonus points):

TODO 4 completed (style mixing implemented)
Multiple extension challenges completed
FID score calculated and tracked (< 50)
Novel experiments or architectural variations
Production-quality code (modular, reusable, documented)
Ablation study comparing architectures (cGAN vs WGAN-GP vs +attention)

🛠️ Troubleshooting

Issue: "RuntimeError: CUDA out of memory"

Solution: Reduce batch size from 128 to 64 or 32:

python

batch_size = 64  # Try 32 if still failing
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

Issue: "Generated images are all gray/noisy"

Solution:

Check class embedding is properly concatenated with noise
Verify discriminator receives correct label inputs
Increase training epochs (try 50-100 epochs)
Reduce learning rate to 0.0001
Check gradient penalty weight (λ=10 is standard)

Issue: "Mode collapse - all images look identical"

Solution:

Implement WGAN-GP (TODO 3) for stable training
Increase discriminator training steps (5 D steps per 1 G step)
Add noise to discriminator inputs: img + 0.1 * torch.randn_like(img)
Check gradient penalty is working (gradient norms should be ~1.0)

Issue: "Gradient penalty returns NaN"

Solution:

Ensure interpolated images require gradients: interpolated.requires_grad_(True)
Use create_graph=True in torch.autograd.grad()
Clip gradient penalty if too large: torch.clamp(gp, 0, 100)
Reduce gradient penalty weight from 10 to 5

Issue: "Generated images don't match requested class"

Solution:

Verify one-hot encoding in discriminator is correct
Check label embedding dimension is reasonable (50-100)
Train longer (conditional GANs need more epochs than unconditional)
Increase embedding dimension from 50 to 100
Add label smoothing: use 0.9 instead of 1.0 for real labels

Issue: "Training is very slow"

Solution:

Confirm GPU is enabled: Runtime → Change runtime type → GPU
Check CUDA is being used: print(torch.cuda.is_available())
Reduce image size from 32x32 to 16x16 for faster prototyping
Use mixed precision training: torch.cuda.amp.autocast()
Reduce number of discriminator training steps per generator step

Issue: "Gradient penalty is too large (>1000)"

Solution:

Check interpolation is correct: alpha * real + (1-alpha) * fake
Normalize gradient before computing penalty: gradient = gradient / (gradient.norm() + 1e-8)
Use smaller learning rate for discriminator (0.0001)
Ensure gradients are computed correctly with create_graph=True

📚 Resources

Key Papers

Conditional GAN: Mirza & Osindero (2014) - "Conditional Generative Adversarial Nets"
WGAN: Arjovsky et al. (2017) - "Wasserstein GAN"
WGAN-GP: Gulrajani et al. (2017) - "Improved Training of Wasserstein GANs"
StyleGAN: Karras et al. (2019) - "A Style-Based Generator Architecture for GANs"
BigGAN: Brock et al. (2019) - "Large Scale GAN Training for High Fidelity Natural Image Synthesis"
SAGAN: Zhang et al. (2019) - "Self-Attention Generative Adversarial Networks"

Concept 11: Generative Adversarial Networks (GAN basics, vanilla GAN)
Concept 12: Advanced GAN Architectures (cGAN, WGAN, StyleGAN theory)
Activity 11: Vanilla GAN on MNIST (prerequisite activity)

Additional Reading

GAN Lab Interactive Visualization
Distill: Deconvolution and Checkerboard Artifacts
StyleGAN2 Paper - Advanced architecture improvements

Code References

PyTorch Examples: DCGAN
pytorch/examples
View on GitHub
WGAN-GP Implementation
caogang/wgan-gp
View on GitHub
Official StyleGAN Repository
NVlabs/stylegan2
View on GitHub

📤 Submission

Complete required TODOs (minimum: TODO 1-2)
Run entire notebook to generate all outputs (50+ epochs recommended)
Generate class-conditional samples: Create grid showing all 10 CIFAR-10 classes
Export results:
- Save generated image grid as PNG
- Save training loss plots
- Export final generator model weights (optional)
Download notebook: File -> Download -> Download .ipynb
Submit via portal: Upload .ipynb and generated images

Submission Checklist:

Filename: activity-12-[YourName].ipynb
All code cells executed successfully
Conditional generator implemented (TODO 1)
Conditional discriminator implemented (TODO 2)
Generated images show class specificity
Grid visualization shows all 10 classes
Training took at least 30 epochs
Loss curves are tracked and plotted
Comments explain architectural choices

Bonus Submission (for exceptional work):

WGAN-GP implemented (TODO 3)
Style mixing implemented (TODO 4)
FID score calculated and reported
Comparison plots (cGAN vs WGAN-GP)
Extension challenges completed

🎉 What's Next?

After mastering conditional GANs:

Move to Activity 13: Variational Autoencoders (VAE)
Learn continuous latent space manipulation for smooth interpolations
Compare VAEs vs GANs: reconstruction quality vs generation realism
Explore hybrid models (VAE-GAN) combining best of both approaches

Key Insight: Conditional GANs give you control over generation through class labels. In Activity 13, you'll learn how VAEs provide smooth, interpretable latent spaces for even finer control over generated outputs!

Good luck! GANs are among the most exciting and challenging models in deep learning. Focus on understanding the conditional mechanisms and training stability - these concepts apply to all modern generative models! 🚀

Template 12: Advanced Gan Architectures

📦 Project Files Included:

PyTorch Examples: DCGAN

WGAN-GP Implementation

Official StyleGAN Repository