Demo Mode

No student ID available

Activity 12 of 18

Activity 12: Advanced GAN Architectures

Practice and reinforce the concepts from Lesson 12

Activity 12: Advanced GAN Architectures

Overview

In this activity, you'll implement state-of-the-art GAN architectures including StyleGAN components, Conditional GANs, and pix2pix for image-to-image translation. You'll learn advanced techniques for high-quality generation and explore practical applications like style transfer and colorization.

Learning Objectives

By completing this activity, you will:

Implement StyleGAN components (style-based generator, AdaIN)
Build Conditional GAN for controlled generation
Apply pix2pix for paired image translation (edges -> photo)
Implement CycleGAN for unpaired translation (horse -> zebra)
Compare Wasserstein GAN vs standard GAN
Generate high-resolution images using progressive training
Evaluate with FID (Fréchet Inception Distance)

Prerequisites

Completed Concept 12: Advanced GAN Architectures
Completed Activity 11: Generative Adversarial Networks (GANs)
Strong understanding of convolutional architectures

Getting Started

Step One: Access the Template

Download the activity template from the Templates folder:

Template: AI25-Template-activity-12-advanced-gan-architectures.zip
Location: Templates/AI25-Template-activity-12-advanced-gan-architectures.zip

Step 2: Open in Google Colab

Extract the ZIP file
Upload activity-12-advanced-gan-architectures.ipynb to Google Colab
Set Runtime to GPU: Runtime -> Change runtime type -> GPU (T4 recommended, A100 for high-res)

Step 3: Run Initial Cells

Execute the first few cells to:

Install PyTorch, torchvision, FID evaluation library
Import libraries
Load datasets (CelebA, edges2shoes)
Set up utilities

What You'll Build

Part One: Style-Based Generator (StyleGAN Components) (YOU COMPLETE)

TODO 1: Implement Adaptive Instance Normalization (AdaIN)

python

class AdaptiveInstanceNorm(nn.Module):
    """
    AdaIN: Inject style into feature maps

    Formula: AdaIN(x, y) = σ(y) * ((x - μ(x)) / σ(x)) + μ(y)

    where:
    - x: content (feature maps)
    - y: style (from mapping network)
    - μ, σ: mean and std
    """
    def __init__(self, num_features):
        super().__init__()
        # TODO 1: Define affine transformation layers
        # self.norm = nn.InstanceNorm2d(num_features, affine=False)
        # No learnable parameters here - style comes from input

    def forward(self, content, style):
        """
        Args:
            content: (batch, channels, height, width)
            style: (batch, channels * 2) - concatenated [scale, shift]

        Returns:
            Styled content (same shape as content)
        """
        # TODO 1: Implement AdaIN
        # Step 1: Normalize content (subtract mean, divide by std)
        # Step 2: Split style into scale and shift
        # Step 3: Apply: normalized * scale + shift

        # Your code here
        pass

TODO 2: Implement StyleGAN Synthesis Block

python

class StyleBlock(nn.Module):
    """
    StyleGAN synthesis block with style modulation
    """
    def __init__(self, in_channels, out_channels, w_dim=512):
        super().__init__()

        # TODO 2: Define block components
        # 1. Upsample (nearest neighbor or bilinear)
        # 2. Conv layer
        # 3. AdaIN
        # 4. Noise injection
        # 5. Activation (LeakyReLU)

        self.upsample = nn.Upsample(scale_factor=2, mode='nearest')
        self.conv = nn.Conv2d(in_channels, out_channels, 3, padding=1)
        self.adain = AdaptiveInstanceNorm(out_channels)
        self.noise_scale = nn.Parameter(torch.zeros(1))
        self.activation = nn.LeakyReLU(0.2)

        # Style mapping: w → (scale, shift)
        self.style_affine = nn.Linear(w_dim, out_channels * 2)

    def forward(self, x, w, noise=None):
        """
        Args:
            x: Input features (batch, in_channels, H, W)
            w: Style vector (batch, w_dim)
            noise: Optional noise (batch, 1, H*2, W*2)

        Returns:
            Styled features (batch, out_channels, H*2, W*2)
        """
        # TODO 2: Implement forward pass
        # 1. Upsample input
        # 2. Apply convolution
        # 3. Add noise (if provided)
        # 4. Apply AdaIN with style
        # 5. Activation

        # Your code here
        pass

Part 2: Conditional GAN (cGAN) (YOU COMPLETE)

TODO 3: Implement class-conditional generator and discriminator

python

class ConditionalGenerator(nn.Module):
    def __init__(self, num_classes=10, latent_dim=100, img_size=64):
        super().__init__()

        # TODO 3a: Conditional generator architecture
        # Approach: Concatenate z and class embedding
        #
        # 1. Class embedding layer: num_classes → latent_dim
        # 2. Concatenate z (latent_dim) + class_emb (latent_dim) = 2*latent_dim
        # 3. Generate image conditioned on concatenated vector

        self.label_emb = nn.Embedding(num_classes, latent_dim)

        # Your generator architecture here
        pass

    def forward(self, z, labels):
        """
        Args:
            z: Noise (batch, latent_dim)
            labels: Class labels (batch,)

        Returns:
            Generated images (batch, 3, 64, 64)
        """
        # TODO 3a: Implement conditional generation
        # 1. Get label embeddings
        # 2. Concatenate with z
        # 3. Pass through generator

        # Your code here
        pass

class ConditionalDiscriminator(nn.Module):
    def __init__(self, num_classes=10, img_size=64):
        super().__init__()

        # TODO 3b: Conditional discriminator
        # Approach: Concatenate image and class-specific channel
        #
        # 1. Class embedding: num_classes → img_size*img_size (spatial)
        # 2. Reshape to (batch, 1, img_size, img_size)
        # 3. Concatenate with image: (batch, 3+1, img_size, img_size)
        # 4. Standard discriminator on concatenated input

        self.label_emb = nn.Embedding(num_classes, img_size*img_size)

        # Your discriminator architecture here
        pass

    def forward(self, img, labels):
        """
        Args:
            img: Images (batch, 3, 64, 64)
            labels: Class labels (batch,)

        Returns:
            Real/fake scores (batch, 1)
        """
        # TODO 3b: Implement conditional discrimination
        # 1. Get label embeddings and reshape to spatial
        # 2. Concatenate with image channels
        # 3. Pass through discriminator

        # Your code here
        pass

Part 3: pix2pix (Paired Translation) (YOU COMPLETE)

TODO 4: Implement U-Net generator for image-to-image translation

python

class UNetGenerator(nn.Module):
    """
    U-Net generator for pix2pix

    Architecture: Encoder-Decoder with skip connections
    Input: Edges/Sketch (1 channel)
    Output: RGB Image (3 channels)
    """
    def __init__(self, in_channels=1, out_channels=3):
        super().__init__()

        # TODO 4: Implement U-Net architecture
        # Encoder (downsampling):
        #   Conv(1 → 64) → Conv(64 → 128) → ... → Conv(512 → 512)
        # Decoder (upsampling):
        #   ConvTranspose(512 → 512) + skip → ... → ConvTranspose(128 → 3)
        #
        # Key: Save encoder activations for skip connections

        # Your code here
        pass

    def forward(self, x):
        """
        Args:
            x: Input edges (batch, 1, 256, 256)

        Returns:
            Generated RGB image (batch, 3, 256, 256)
        """
        # TODO 4: Implement forward pass with skip connections
        # Encoder: Save intermediate activations
        # Decoder: Concatenate with corresponding encoder activations

        # Your code here
        pass

class PatchGANDiscriminator(nn.Module):
    """
    PatchGAN discriminator (70×70 receptive field)

    Outputs: (batch, 1, 30, 30) - real/fake for each 70×70 patch
    """
    def __init__(self, in_channels=4):  # 3 (real img) + 1 (condition)
        super().__init__()

        # TODO 4: Implement PatchGAN discriminator
        # Architecture: 5 conv layers, stride 2
        # Output shape: (batch, 1, H/16, W/16)

        # Your code here
        pass

    def forward(self, img, condition):
        """
        Args:
            img: Generated or real image (batch, 3, 256, 256)
            condition: Input edges (batch, 1, 256, 256)

        Returns:
            Patch-wise scores (batch, 1, 30, 30)
        """
        # TODO 4: Concatenate img and condition, then discriminate
        # Your code here
        pass

TODO 5: Implement pix2pix loss (adversarial + L1)

python

def pix2pix_loss(d_real, d_fake, fake_img, real_img, lambda_l1=100):
    """
    pix2pix loss = GAN loss + λ * L1 loss

    Args:
        d_real: Discriminator output on real pairs
        d_fake: Discriminator output on fake pairs
        fake_img: Generated image
        real_img: Ground truth image
        lambda_l1: Weight for L1 reconstruction loss

    Returns:
        g_loss: Generator loss
        d_loss: Discriminator loss
    """
    # TODO 5: Implement pix2pix loss
    # Discriminator loss: standard GAN loss
    # Generator loss: adversarial loss + lambda_l1 * L1(fake, real)

    # Your code here
    pass

Part 4: Wasserstein GAN with Gradient Penalty (WGAN-GP) (YOU COMPLETE)

TODO 6: Implement WGAN-GP loss and gradient penalty

python

def wgan_discriminator_loss(d_real, d_fake):
    """
    WGAN discriminator loss (Earth Mover's Distance approximation)

    Loss = E[D(fake)] - E[D(real)]  (want to maximize distance)

    Returns:
        Loss to minimize: -(-loss) = E[D(real)] - E[D(fake)]
    """
    # TODO 6a: Implement WGAN D loss
    # Your code here
    pass

def wgan_generator_loss(d_fake):
    """
    WGAN generator loss

    Loss = -E[D(G(z))]  (want D(fake) to be high)
    """
    # TODO 6b: Implement WGAN G loss
    # Your code here
    pass

def gradient_penalty(discriminator, real_samples, fake_samples, device):
    """
    Gradient penalty for WGAN-GP

    GP = E[(||∇D(x̂)||₂ - 1)²]
    where x̂ = εx_real + (1-ε)x_fake, ε ~ U(0,1)

    Args:
        discriminator: Discriminator model
        real_samples: Real images (batch, 3, H, W)
        fake_samples: Fake images (batch, 3, H, W)
        device: 'cuda' or 'cpu'

    Returns:
        gp: Gradient penalty scalar
    """
    # TODO 6c: Implement gradient penalty
    # Step 1: Interpolate between real and fake
    #   epsilon = torch.rand(batch_size, 1, 1, 1)
    #   interpolated = epsilon * real + (1 - epsilon) * fake
    # Step 2: Compute discriminator output on interpolated
    #   d_interpolated = discriminator(interpolated)
    # Step 3: Compute gradients w.r.t. interpolated
    #   gradients = autograd.grad(outputs=d_interpolated, inputs=interpolated, ...)
    # Step 4: Compute gradient penalty
    #   gp = ((gradients.norm(2, dim=1) - 1) ** 2).mean()

    # Your code here
    pass

Part 5: FID Evaluation (PRE-BUILT)

Pre-built FID (Fréchet Inception Distance) evaluation:

Load InceptionV3 model
Extract features from real and fake images
Compute FID score (lower = better)

Interpretation:

``FID < 10``: Excellent quality
FID 10-20: Good quality
FID 20-50: Moderate quality
``FID > 50``: Poor quality

Part 6: Progressive Training (YOU COMPLETE)

TODO 7: Implement progressive growing (4x4 -> 8x8 -> 16x16 -> 32x32)

python

class ProgressiveGenerator(nn.Module):
    """
    Progressive GAN generator

    Grows from 4×4 → 8×8 → 16×16 → 32×32 → 64×64
    """
    def __init__(self, latent_dim=512, max_resolution=64):
        super().__init__()

        # TODO 7: Define resolution-specific blocks
        # self.blocks = {
        #     4: Block_4x4,
        #     8: Block_8x8,
        #     16: Block_16x16,
        #     ...
        # }

        # Your code here
        pass

    def forward(self, z, target_resolution, alpha=1.0):
        """
        Args:
            z: Latent vector (batch, latent_dim)
            target_resolution: Output resolution (4, 8, 16, 32, or 64)
            alpha: Fade-in parameter (0 to 1)

        Returns:
            Generated image at target resolution
        """
        # TODO 7: Implement progressive forward pass
        # If alpha < 1, blend current resolution with upsampled previous resolution

        # Your code here
        pass

Expected Results

Part One: StyleGAN Components

AdaIN Test:

css

Content: Random feature maps (64 channels)
Style 1: "Blue tint" vector
Style 2: "Red tint" vector

✓ AdaIN(content, style1): Blue-tinted features
✓ AdaIN(content, style2): Red-tinted features
✓ Smooth style interpolation

Part 2: Conditional GAN

Class-Conditional Generation (CelebA attributes):

sql

Generate "Smiling, Female, Young":
✓ 10/10 images show smiling young women

Generate "Male, Beard, Glasses":
✓ 10/10 images show bearded men with glasses

✓ Conditional generation accurate

Part 3: pix2pix

Edges -> Photo Translation:

yaml

Input: Shoe edge drawing
Output: Realistic shoe photo

✓ Colors realistic
✓ Textures detailed
✓ Structure preserved
✓ No artifacts

FID Score: 12.5 (excellent)

Part 4: WGAN-GP

Training Stability Comparison:

yaml

Standard GAN: 40% runs diverge
WGAN-GP: 0% runs diverge

✓ Perfect training stability
✓ Smoother loss curves
✓ Better mode coverage

Part 5: FID Evaluation

Quality Comparison:

sql

Model          | FID Score | Quality
---------------|-----------|----------
Standard GAN   | 35.2      | Moderate
WGAN-GP        | 18.7      | Good
StyleGAN       | 8.3       | Excellent
pix2pix        | 12.5      | Excellent

✓ FID correlates with visual quality

Part 6: Progressive Training

Resolution Progression:

yaml

4×4 (Epoch 1-5):     Blobs, basic shapes
8×8 (Epoch 6-10):    Recognizable objects
16×16 (Epoch 11-15): Detailed features
32×32 (Epoch 16-20): High quality
64×64 (Epoch 21-25): Photo-realistic

✓ Smooth quality improvement
✓ No training collapse

Success Criteria

Your implementation is complete when:

StyleGAN components produce smooth style interpolation
Conditional GAN generates correct class-specific images
pix2pix translates edges to realistic photos (``FID < 20``)
WGAN-GP trains without divergence (100% success rate)
FID scores computed correctly for all models
Progressive GAN grows smoothly from 4x4 to 64x64

Tips for Success

pix2pix Training Tips

One. L1 Loss Weight:

λ = 100 (standard)
Higher λ: More faithful to input, less creative
Lower λ: More creative, less faithful

2. PatchGAN Discriminator:

70x70 receptive field is optimal
Smaller: faster but less discriminative
Larger: more discriminative but slower

3. Skip Connections:

Essential for preserving spatial information
Without skips: blurry outputs

WGAN-GP Tips

One. Critic Updates:

Train discriminator 5x more than generator
Ensures discriminator remains strong

2. Gradient Penalty Weight:

λ_gp = 10 (standard)
Higher: more stable, slower convergence

3. No BatchNorm in Critic:

BatchNorm breaks gradient penalty
Use LayerNorm or InstanceNorm instead

Progressive Growing Tips

One. Fade-In Duration:

Gradually increase alpha from 0 to 1 over 10K-50K images
Smooth transition prevents artifacts

2. Resolution Stages:

Train each resolution for equal number of images (not epochs)
Example: 800K images per resolution

Extension Challenges

Challenge One: StyleGAN2 Improvements (Hard)

Implement StyleGAN2 enhancements:

Weight demodulation
Path length regularization
Lazy regularization

Benefit: Better image quality, fewer artifacts

Challenge 2: CycleGAN (Very Hard)

Implement unpaired image translation:

python

class CycleGAN:
    def __init__(self):
        self.G_AB = Generator()  # A → B
        self.G_BA = Generator()  # B → A
        self.D_A = Discriminator()
        self.D_B = Discriminator()

    def cycle_consistency_loss(self, real_A, fake_B, reconstructed_A):
        """
        Ensure G_BA(G_AB(A)) ≈ A
        """
        return L1Loss(reconstructed_A, real_A)

Use case: Horse <-> Zebra, Summer <-> Winter

Challenge 3: StarGAN (Multi-Domain) (Hard)

Single generator for multiple domain translations:

python

def stargan_generator(img, target_domain):
    """
    Generate image with target domain attributes
    """
    pass

Use case: Change hair color, age, gender with one model

Challenge 4: High-Resolution StyleGAN (Very Hard)

Train StyleGAN on 1024x1024 images:

Requires A100 GPU or multi-GPU setup
8-12 days of training
Produces photorealistic faces

Submission Requirements

What to Submit

Completed Notebook: activity-12-advanced-gan-architectures.ipynb
- All TODOs completed
- Training logs visible
Generated Samples:
- StyleGAN style mixing results (3 examples)
- Conditional GAN class-specific generations (all classes)
- pix2pix translation examples (5 edge->photo pairs)
- WGAN-GP samples vs standard GAN
- Progressive GAN at each resolution (4x4, 8x8, 16x16, 32x32, 64x64)
FID Scores:
- Table comparing all models
- Visual correlation with sample quality
Analysis (7-10 sentences):
- Which architecture produced best quality? Why?
- How did WGAN-GP improve training stability?
- When would you use pix2pix vs CycleGAN?
- What are the trade-offs of progressive training?

Submission Steps

Complete all training tasks
Generate comparison visualizations
Compute FID scores
Download notebook
Submit via [course portal link]

Resources

Documentation

StyleGAN Paper (Karras et al., 2018)
pix2pix Paper (Isola et al., 2016)
WGAN-GP Paper (Gulrajani et al., 2017)

Papers

Progressive GAN (Karras et al., 2017)
StyleGAN2 (Karras et al., 2019)
CycleGAN (Zhu et al., 2017)

Adaptive Instance Normalization (AdaIN)
U-Net architecture
PatchGAN discriminator
Wasserstein distance
Fréchet Inception Distance (FID)

Next Steps

Next Activity: Activity 13 - Diffusion Models

State-of-the-art image generation
Denoising diffusion probabilistic models (DDPM)
Classifier-free guidance
Latent diffusion (Stable Diffusion)

Assessment

This activity is graded on:

Code Completion (35%): All TODOs implemented
Model Quality (30%): FID scores, visual quality
Training Success (20%): All models converge
Analysis (15%): Demonstrates understanding of trade-offs

Passing Grade: 70% or higher

Congratulations on mastering advanced GAN architectures! 🎉🚀

Activity 12 of 18