By completing this activity, you will:
Understand conditional GANs (cGAN) for class-controlled generation
Implement class embeddings and label conditioning mechanisms
Apply Wasserstein GAN with Gradient Penalty (WGAN-GP) for stable training
Master gradient penalty calculation to enforce Lipschitz constraint
Explore style mixing techniques from StyleGAN architecture
Evaluate class-conditional image generation quality on CIFAR-10
Open in Google Colab : Upload this notebook to Google Colab
Enable GPU : Runtime -> Change runtime type -> GPU (T4)
Run All Cells : Click Runtime -> Run all (or press Ctrl+F9)
Watch the Magic : You'll see:
✅ CIFAR-10 dataset loading with class labels
✅ Conditional GAN architecture initialized
✅ Training loop running on GPU
✅ Class-conditional generated images (airplane, car, bird, etc.)
Expected First Run Time : ~90 seconds (GPU initialization + 1 epoch)
The template comes with 70% working code :
✅ CIFAR-10 Dataset : 60,000 images with 10 class labels loaded
✅ Conditional GAN Framework : Architecture skeleton for class-conditional generation
✅ Training Loop : Complete training harness with loss tracking
✅ Visualization Tools : Grid display for all 10 classes
✅ GPU Support : Automatic CUDA detection and model transfer
✅ Progress Tracking : Real-time loss plots and sample generation
⚠️ TODO 1 : Implement conditional generator (class embedding + noise concatenation)
⚠️ TODO 2 : Implement conditional discriminator (image + label input)
⚠️ TODO 3 : Add WGAN-GP loss (Wasserstein distance with gradient penalty)
⚠️ TODO 4 : Implement style mixing (StyleGAN feature) - Extension
Location : Section 3 - "Conditional Generator Architecture"
Current State : Generator takes noise but ignores class labels
Your Task : Make generator class-conditional by:
Adding an embedding layer for class labels (10 classes -> 50-dim embedding)
Concatenating class embedding with noise vector (z + c)
Modifying input dimension to accommodate combined vector
Starter Code Provided :
python
class ConditionalGenerator (nn.Module):
def __init__ (self, latent_dim=100 , num_classes=10 , embed_dim=50 ):
super ().__init__()
self .model = nn.Sequential(...)
Key Concepts :
Embedding Layer : Maps discrete class IDs to continuous vectors
Concatenation : Combines noise and class info: torch.cat([z, c], dim=1)
Conditional Generation : Generator output depends on both z and class label
Success Criteria :
Location : Section 4 - "Conditional Discriminator Architecture"
Your Task : Modify discriminator to accept both image and class label:
Convert class label to one-hot encoding (10 classes -> 10-dim vector)
Spatially replicate one-hot vector to match image dimensions
Concatenate with image channels (3 RGB + 10 label = 13 channels)
Starter Code Provided :
python
class ConditionalDiscriminator (nn.Module):
def forward (self, img, labels ):
return self .model(x)
Key Concepts :
One-Hot Encoding : Class 3 -> [0,0,0,1,0,0,0,0,0,0]
Spatial Replication : Expand (B,10,1,1) -> (B,10,32,32) for pixel-wise conditioning
Channel Concatenation : (B,3,32,32) + (B,10,32,32) -> (B,13,32,32)
Success Criteria :
Location : Section 5 - "WGAN-GP Loss Functions"
Your Task : Replace standard GAN loss with Wasserstein loss + gradient penalty:
3A. Wasserstein Distance :
3B. Gradient Penalty (enforce 1-Lipschitz constraint):
python
def gradient_penalty (discriminator, real_images, fake_images, labels, device ):
Key Concepts :
Wasserstein Distance : Measures distribution similarity (more stable than JS divergence)
Lipschitz Constraint : Discriminator gradients must have ``norm <= 1``
Gradient Penalty : Soft constraint enforcing Lipschitz condition (λ=10)
WGAN-GP Benefits : Stable training, meaningful loss curves, no mode collapse
Success Criteria :
Hints :
Use torch.autograd.grad() to compute gradients
Set create_graph=True to allow backprop through gradient computation
Interpolation: alpha * real + (1-alpha) * fake where alpha ~ Uniform(0,1)
Location : New section you'll create
Your Task : Implement StyleGAN-inspired style mixing:
Generate image with two different noise vectors: z1 and z2
Use z1 for early layers (coarse features: shape, pose)
Switch to z2 at middle layers (fine features: color, texture)
Visualize how mixing point affects output
Requirements :
Modify generator to accept crossover layer index
Generate grid showing: [z1 only, mix at layer 2, mix at layer 4, z2 only]
Use different classes for z1 and z2 (e.g., "dog" structure with "car" texture)
Example Output :
java
Row 1 : Dog (z1) → Dog-Car mix (early) → Dog-Car mix (late) → Car (z2)
Row 2 : Bird (z1) → Bird-Ship mix → Bird-Ship mix → Ship (z2)
Success Criteria :
Once you've completed all TODOs, try these advanced challenges:
Implement progressive training from StyleGAN:
Start training at 8x8 resolution
Gradually increase to 16x16, then 32x32
Use fade-in layers during resolution transitions
Compare training stability vs fixed resolution
Benefits : Faster convergence, higher quality images, stable training
Add self-attention layers from SAGAN (Self-Attention GAN):
Implement attention module: Query, Key, Value projections
Add to both generator and discriminator at 16x16 feature maps
Visualize attention maps to see what model focuses on
Compare FID scores vs baseline cGAN
Paper : "Self-Attention Generative Adversarial Networks" (Zhang et al., 2019)
Implement Fréchet Inception Distance (FID) for objective quality measurement:
Load pretrained InceptionV3 model
Extract features for real and generated images
Calculate FID = ||μ_real - μ_fake||² + Tr(Σ_real + Σ_fake - 2√(Σ_real Σ_fake))
Track FID during training (lower is better)
Target : ``FID < 50`` is good for CIFAR-10, ``FID < 30`` is excellent
Implement components from BigGAN for high-quality generation:
Orthogonal Regularization : Add penalty term to encourage orthogonal weight matrices
Truncation Trick : Sample z from truncated normal distribution (reduce diversity, increase quality)
Class Embeddings in BatchNorm : Inject class info via conditional batch normalization
Compare quality/diversity tradeoffs
Paper : "Large Scale GAN Training for High Fidelity Natural Image Synthesis" (Brock et al., 2019)
Training Stability: Poor (mode collapse after ~20 epochs)
FID Score: 80-120 (high = poor quality)
Class Specificity: None (unconditional generation)
Training Stability: Moderate (some instability)
FID Score: 50-80
Class Specificity: 70-80% accuracy (generated images match requested class)
Visual Quality: Recognizable CIFAR-10 objects
Training Stability: Excellent (no mode collapse)
FID Score: 35-50 (significant improvement)
Loss Curves: Meaningful (negative = better)
Gradient Norms: Stable around 1.0
Creative Control: Can mix attributes from different classes
Diversity: More varied outputs within same class
Applications: Data augmentation, creative design tools
Progressive Growing: FID 25-35 , 2x faster convergence
Self-Attention: Better spatial coherence, sharper details
BigGAN features: FID 20-30 , near state-of-the-art for CIFAR-10
Minimum Requirements (for passing):
Target Grade (for excellent work):
Exceptional Work (bonus points):
Solution : Reduce batch size from 128 to 64 or 32:
python
batch_size = 64
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True )
Solution :
Check class embedding is properly concatenated with noise
Verify discriminator receives correct label inputs
Increase training epochs (try 50-100 epochs)
Reduce learning rate to 0.0001
Check gradient penalty weight (λ=10 is standard)
Solution :
Implement WGAN-GP (TODO 3) for stable training
Increase discriminator training steps (5 D steps per 1 G step)
Add noise to discriminator inputs: img + 0.1 * torch.randn_like(img)
Check gradient penalty is working (gradient norms should be ~1.0)
Solution :
Ensure interpolated images require gradients: interpolated.requires_grad_(True)
Use create_graph=True in torch.autograd.grad()
Clip gradient penalty if too large: torch.clamp(gp, 0, 100)
Reduce gradient penalty weight from 10 to 5
Solution :
Verify one-hot encoding in discriminator is correct
Check label embedding dimension is reasonable (50-100)
Train longer (conditional GANs need more epochs than unconditional)
Increase embedding dimension from 50 to 100
Add label smoothing: use 0.9 instead of 1.0 for real labels
Solution :
Confirm GPU is enabled: Runtime → Change runtime type → GPU
Check CUDA is being used: print(torch.cuda.is_available())
Reduce image size from 32x32 to 16x16 for faster prototyping
Use mixed precision training: torch.cuda.amp.autocast()
Reduce number of discriminator training steps per generator step
Solution :
Check interpolation is correct: alpha * real + (1-alpha) * fake
Normalize gradient before computing penalty: gradient = gradient / (gradient.norm() + 1e-8)
Use smaller learning rate for discriminator (0.0001)
Ensure gradients are computed correctly with create_graph=True
Conditional GAN : Mirza & Osindero (2014) - "Conditional Generative Adversarial Nets"
WGAN : Arjovsky et al. (2017) - "Wasserstein GAN"
WGAN-GP : Gulrajani et al. (2017) - "Improved Training of Wasserstein GANs"
StyleGAN : Karras et al. (2019) - "A Style-Based Generator Architecture for GANs"
BigGAN : Brock et al. (2019) - "Large Scale GAN Training for High Fidelity Natural Image Synthesis"
SAGAN : Zhang et al. (2019) - "Self-Attention Generative Adversarial Networks"
Concept 11 : Generative Adversarial Networks (GAN basics, vanilla GAN)
Concept 12 : Advanced GAN Architectures (cGAN, WGAN, StyleGAN theory)
Activity 11 : Vanilla GAN on MNIST (prerequisite activity)
Complete required TODOs (minimum: TODO 1-2)
Run entire notebook to generate all outputs (50+ epochs recommended)
Generate class-conditional samples : Create grid showing all 10 CIFAR-10 classes
Export results :
Save generated image grid as PNG
Save training loss plots
Export final generator model weights (optional)
Download notebook : File -> Download -> Download .ipynb
Submit via portal : Upload .ipynb and generated images
Submission Checklist :
Bonus Submission (for exceptional work):
After mastering conditional GANs:
Move to Activity 13: Variational Autoencoders (VAE)
Learn continuous latent space manipulation for smooth interpolations
Compare VAEs vs GANs: reconstruction quality vs generation realism
Explore hybrid models (VAE-GAN) combining best of both approaches
Key Insight : Conditional GANs give you control over generation through class labels. In Activity 13, you'll learn how VAEs provide smooth, interpretable latent spaces for even finer control over generated outputs!
Good luck! GANs are among the most exciting and challenging models in deep learning. Focus on understanding the conditional mechanisms and training stability - these concepts apply to all modern generative models! 🚀