Practice and reinforce the concepts from Lesson 12
In this activity, you'll implement state-of-the-art GAN architectures including StyleGAN components, Conditional GANs, and pix2pix for image-to-image translation. You'll learn advanced techniques for high-quality generation and explore practical applications like style transfer and colorization.
By completing this activity, you will:
Download the activity template from the Templates folder:
AI25-Template-activity-12-advanced-gan-architectures.zipTemplates/AI25-Template-activity-12-advanced-gan-architectures.zipactivity-12-advanced-gan-architectures.ipynb to Google ColabExecute the first few cells to:
TODO 1: Implement Adaptive Instance Normalization (AdaIN)
class AdaptiveInstanceNorm(nn.Module):
"""
AdaIN: Inject style into feature maps
Formula: AdaIN(x, y) = σ(y) * ((x - μ(x)) / σ(x)) + μ(y)
where:
- x: content (feature maps)
- y: style (from mapping network)
- μ, σ: mean and std
"""
def __init__(self, num_features):
super().__init__()
# TODO 1: Define affine transformation layers
# self.norm = nn.InstanceNorm2d(num_features, affine=False)
# No learnable parameters here - style comes from input
def forward(self, content, style):
"""
Args:
content: (batch, channels, height, width)
style: (batch, channels * 2) - concatenated [scale, shift]
Returns:
Styled content (same shape as content)
"""
# TODO 1: Implement AdaIN
# Step 1: Normalize content (subtract mean, divide by std)
# Step 2: Split style into scale and shift
# Step 3: Apply: normalized * scale + shift
# Your code here
pass
TODO 2: Implement StyleGAN Synthesis Block
class StyleBlock(nn.Module):
"""
StyleGAN synthesis block with style modulation
"""
def __init__(self, in_channels, out_channels, w_dim=512):
super().__init__()
# TODO 2: Define block components
# 1. Upsample (nearest neighbor or bilinear)
# 2. Conv layer
# 3. AdaIN
# 4. Noise injection
# 5. Activation (LeakyReLU)
self.upsample = nn.Upsample(scale_factor=2, mode='nearest')
self.conv = nn.Conv2d(in_channels, out_channels, 3, padding=1)
self.adain = AdaptiveInstanceNorm(out_channels)
self.noise_scale = nn.Parameter(torch.zeros(1))
self.activation = nn.LeakyReLU(0.2)
# Style mapping: w → (scale, shift)
self.style_affine = nn.Linear(w_dim, out_channels * 2)
def forward(self, x, w, noise=None):
"""
Args:
x: Input features (batch, in_channels, H, W)
w: Style vector (batch, w_dim)
noise: Optional noise (batch, 1, H*2, W*2)
Returns:
Styled features (batch, out_channels, H*2, W*2)
"""
# TODO 2: Implement forward pass
# 1. Upsample input
# 2. Apply convolution
# 3. Add noise (if provided)
# 4. Apply AdaIN with style
# 5. Activation
# Your code here
pass
TODO 3: Implement class-conditional generator and discriminator
class ConditionalGenerator(nn.Module):
def __init__(self, num_classes=10, latent_dim=100, img_size=64):
super().__init__()
# TODO 3a: Conditional generator architecture
# Approach: Concatenate z and class embedding
#
# 1. Class embedding layer: num_classes → latent_dim
# 2. Concatenate z (latent_dim) + class_emb (latent_dim) = 2*latent_dim
# 3. Generate image conditioned on concatenated vector
self.label_emb = nn.Embedding(num_classes, latent_dim)
# Your generator architecture here
pass
def forward(self, z, labels):
"""
Args:
z: Noise (batch, latent_dim)
labels: Class labels (batch,)
Returns:
Generated images (batch, 3, 64, 64)
"""
# TODO 3a: Implement conditional generation
# 1. Get label embeddings
# 2. Concatenate with z
# 3. Pass through generator
# Your code here
pass
class ConditionalDiscriminator(nn.Module):
def __init__(self, num_classes=10, img_size=64):
super().__init__()
# TODO 3b: Conditional discriminator
# Approach: Concatenate image and class-specific channel
#
# 1. Class embedding: num_classes → img_size*img_size (spatial)
# 2. Reshape to (batch, 1, img_size, img_size)
# 3. Concatenate with image: (batch, 3+1, img_size, img_size)
# 4. Standard discriminator on concatenated input
self.label_emb = nn.Embedding(num_classes, img_size*img_size)
# Your discriminator architecture here
pass
def forward(self, img, labels):
"""
Args:
img: Images (batch, 3, 64, 64)
labels: Class labels (batch,)
Returns:
Real/fake scores (batch, 1)
"""
# TODO 3b: Implement conditional discrimination
# 1. Get label embeddings and reshape to spatial
# 2. Concatenate with image channels
# 3. Pass through discriminator
# Your code here
pass
TODO 4: Implement U-Net generator for image-to-image translation
class UNetGenerator(nn.Module):
"""
U-Net generator for pix2pix
Architecture: Encoder-Decoder with skip connections
Input: Edges/Sketch (1 channel)
Output: RGB Image (3 channels)
"""
def __init__(self, in_channels=1, out_channels=3):
super().__init__()
# TODO 4: Implement U-Net architecture
# Encoder (downsampling):
# Conv(1 → 64) → Conv(64 → 128) → ... → Conv(512 → 512)
# Decoder (upsampling):
# ConvTranspose(512 → 512) + skip → ... → ConvTranspose(128 → 3)
#
# Key: Save encoder activations for skip connections
# Your code here
pass
def forward(self, x):
"""
Args:
x: Input edges (batch, 1, 256, 256)
Returns:
Generated RGB image (batch, 3, 256, 256)
"""
# TODO 4: Implement forward pass with skip connections
# Encoder: Save intermediate activations
# Decoder: Concatenate with corresponding encoder activations
# Your code here
pass
class PatchGANDiscriminator(nn.Module):
"""
PatchGAN discriminator (70×70 receptive field)
Outputs: (batch, 1, 30, 30) - real/fake for each 70×70 patch
"""
def __init__(self, in_channels=4): # 3 (real img) + 1 (condition)
super().__init__()
# TODO 4: Implement PatchGAN discriminator
# Architecture: 5 conv layers, stride 2
# Output shape: (batch, 1, H/16, W/16)
# Your code here
pass
def forward(self, img, condition):
"""
Args:
img: Generated or real image (batch, 3, 256, 256)
condition: Input edges (batch, 1, 256, 256)
Returns:
Patch-wise scores (batch, 1, 30, 30)
"""
# TODO 4: Concatenate img and condition, then discriminate
# Your code here
pass
TODO 5: Implement pix2pix loss (adversarial + L1)
def pix2pix_loss(d_real, d_fake, fake_img, real_img, lambda_l1=100):
"""
pix2pix loss = GAN loss + λ * L1 loss
Args:
d_real: Discriminator output on real pairs
d_fake: Discriminator output on fake pairs
fake_img: Generated image
real_img: Ground truth image
lambda_l1: Weight for L1 reconstruction loss
Returns:
g_loss: Generator loss
d_loss: Discriminator loss
"""
# TODO 5: Implement pix2pix loss
# Discriminator loss: standard GAN loss
# Generator loss: adversarial loss + lambda_l1 * L1(fake, real)
# Your code here
pass
TODO 6: Implement WGAN-GP loss and gradient penalty
def wgan_discriminator_loss(d_real, d_fake):
"""
WGAN discriminator loss (Earth Mover's Distance approximation)
Loss = E[D(fake)] - E[D(real)] (want to maximize distance)
Returns:
Loss to minimize: -(-loss) = E[D(real)] - E[D(fake)]
"""
# TODO 6a: Implement WGAN D loss
# Your code here
pass
def wgan_generator_loss(d_fake):
"""
WGAN generator loss
Loss = -E[D(G(z))] (want D(fake) to be high)
"""
# TODO 6b: Implement WGAN G loss
# Your code here
pass
def gradient_penalty(discriminator, real_samples, fake_samples, device):
"""
Gradient penalty for WGAN-GP
GP = E[(||∇D(x̂)||₂ - 1)²]
where x̂ = εx_real + (1-ε)x_fake, ε ~ U(0,1)
Args:
discriminator: Discriminator model
real_samples: Real images (batch, 3, H, W)
fake_samples: Fake images (batch, 3, H, W)
device: 'cuda' or 'cpu'
Returns:
gp: Gradient penalty scalar
"""
# TODO 6c: Implement gradient penalty
# Step 1: Interpolate between real and fake
# epsilon = torch.rand(batch_size, 1, 1, 1)
# interpolated = epsilon * real + (1 - epsilon) * fake
# Step 2: Compute discriminator output on interpolated
# d_interpolated = discriminator(interpolated)
# Step 3: Compute gradients w.r.t. interpolated
# gradients = autograd.grad(outputs=d_interpolated, inputs=interpolated, ...)
# Step 4: Compute gradient penalty
# gp = ((gradients.norm(2, dim=1) - 1) ** 2).mean()
# Your code here
pass
Pre-built FID (Fréchet Inception Distance) evaluation:
Interpretation:
TODO 7: Implement progressive growing (4x4 -> 8x8 -> 16x16 -> 32x32)
class ProgressiveGenerator(nn.Module):
"""
Progressive GAN generator
Grows from 4×4 → 8×8 → 16×16 → 32×32 → 64×64
"""
def __init__(self, latent_dim=512, max_resolution=64):
super().__init__()
# TODO 7: Define resolution-specific blocks
# self.blocks = {
# 4: Block_4x4,
# 8: Block_8x8,
# 16: Block_16x16,
# ...
# }
# Your code here
pass
def forward(self, z, target_resolution, alpha=1.0):
"""
Args:
z: Latent vector (batch, latent_dim)
target_resolution: Output resolution (4, 8, 16, 32, or 64)
alpha: Fade-in parameter (0 to 1)
Returns:
Generated image at target resolution
"""
# TODO 7: Implement progressive forward pass
# If alpha < 1, blend current resolution with upsampled previous resolution
# Your code here
pass
AdaIN Test:
Content: Random feature maps (64 channels)
Style 1: "Blue tint" vector
Style 2: "Red tint" vector
✓ AdaIN(content, style1): Blue-tinted features
✓ AdaIN(content, style2): Red-tinted features
✓ Smooth style interpolation
Class-Conditional Generation (CelebA attributes):
Generate "Smiling, Female, Young":
✓ 10/10 images show smiling young women
Generate "Male, Beard, Glasses":
✓ 10/10 images show bearded men with glasses
✓ Conditional generation accurate
Edges -> Photo Translation:
Input: Shoe edge drawing
Output: Realistic shoe photo
✓ Colors realistic
✓ Textures detailed
✓ Structure preserved
✓ No artifacts
FID Score: 12.5 (excellent)
Training Stability Comparison:
Standard GAN: 40% runs diverge
WGAN-GP: 0% runs diverge
✓ Perfect training stability
✓ Smoother loss curves
✓ Better mode coverage
Quality Comparison:
Model | FID Score | Quality
---------------|-----------|----------
Standard GAN | 35.2 | Moderate
WGAN-GP | 18.7 | Good
StyleGAN | 8.3 | Excellent
pix2pix | 12.5 | Excellent
✓ FID correlates with visual quality
Resolution Progression:
4×4 (Epoch 1-5): Blobs, basic shapes
8×8 (Epoch 6-10): Recognizable objects
16×16 (Epoch 11-15): Detailed features
32×32 (Epoch 16-20): High quality
64×64 (Epoch 21-25): Photo-realistic
✓ Smooth quality improvement
✓ No training collapse
Your implementation is complete when:
One. L1 Loss Weight:
2. PatchGAN Discriminator:
3. Skip Connections:
One. Critic Updates:
2. Gradient Penalty Weight:
3. No BatchNorm in Critic:
One. Fade-In Duration:
2. Resolution Stages:
Implement StyleGAN2 enhancements:
Benefit: Better image quality, fewer artifacts
Implement unpaired image translation:
class CycleGAN:
def __init__(self):
self.G_AB = Generator() # A → B
self.G_BA = Generator() # B → A
self.D_A = Discriminator()
self.D_B = Discriminator()
def cycle_consistency_loss(self, real_A, fake_B, reconstructed_A):
"""
Ensure G_BA(G_AB(A)) ≈ A
"""
return L1Loss(reconstructed_A, real_A)
Use case: Horse <-> Zebra, Summer <-> Winter
Single generator for multiple domain translations:
def stargan_generator(img, target_domain):
"""
Generate image with target domain attributes
"""
pass
Use case: Change hair color, age, gender with one model
Train StyleGAN on 1024x1024 images:
Completed Notebook: activity-12-advanced-gan-architectures.ipynb
Generated Samples:
FID Scores:
Analysis (7-10 sentences):
Next Activity: Activity 13 - Diffusion Models
This activity is graded on:
Passing Grade: 70% or higher
Congratulations on mastering advanced GAN architectures! 🎉🚀