Variational Autoencoders (VAEs) - Discovery Challenge

🎯 Learning Objectives

By completing this activity, you will:

Understand variational autoencoders (VAEs) and probabilistic latent representations
Implement the reparameterization trick for backpropagation through stochastic nodes
Master the ELBO loss (Evidence Lower Bound) with reconstruction and KL divergence
Build encoder-decoder architectures with CNN layers in PyTorch
Analyze and visualize latent space structure and interpolations
Generate new images by sampling from learned distributions

🚀 Getting Started (See Results in 30 Seconds!)

Open in Google Colab: Upload this notebook to Google Colab
Run All Cells: Click Runtime -> Run all (or press Ctrl+F9)
Watch the Magic: You'll see:
- ✅ Fashion-MNIST dataset loaded and visualized
- ✅ VAE encoder/decoder architecture initialized
- ✅ Random latent samples generating blurry images (before training)
- ✅ Training loop ready to run

Expected First Run Time: ~45 seconds

🎯 What's Already Working

The template comes with 65% working code:

✅ Dataset Loading: Fashion-MNIST (28x28 grayscale images, 60K training samples)
✅ Encoder Architecture: CNN layers -> flatten -> fully connected -> μ and log_σ
✅ Decoder Architecture: Fully connected -> reshape -> transpose CNN -> reconstructed image
✅ Training Loop: Full epoch loop with batch processing
✅ Visualization Tools: Image grids, latent space plots, interpolations
✅ Model Saving: Checkpoint saving after training

What Needs Your Work (35%):

⚠️ TODO 1: Implement reparameterization trick (z = μ + σ * ε)
⚠️ TODO 2: Implement ELBO loss (reconstruction + KL divergence)
⚠️ TODO 3: Implement latent space traversal visualization

📋 Tasks to Complete

TODO 1: Implement Reparameterization Trick (Medium)

Location: Section 4 - "VAE Model Architecture"

Current State: Encoder produces μ (mu) and log_σ (log-sigma), but sampling z is not implemented

Your Task: Implement the reparameterization trick to sample latent code z from N(μ, σ²):

scss

z = μ + σ * ε   where ε ~ N(0, 1)
σ = exp(log_σ / 2)

Why This Matters:

Naive sampling z ~ N(μ, σ²) is not differentiable -> can't backpropagate
Reparameterization moves randomness to ε -> gradients flow through μ and σ
This is the key innovation that makes VAEs trainable!

Starter Code Provided:

python

def reparameterize(self, mu, log_var):
    """
    Reparameterization trick: z = μ + σ * ε

    Args:
        mu: Mean of latent distribution (batch_size, latent_dim)
        log_var: Log variance of latent distribution (batch_size, latent_dim)

    Returns:
        z: Sampled latent code (batch_size, latent_dim)
    """
    # TODO: Implement reparameterization trick
    # Hint 1: std = torch.exp(log_var / 2)
    # Hint 2: eps = torch.randn_like(std)
    # Hint 3: z = mu + std * eps
    pass

Success Criteria:

Function returns tensor with shape (batch_size, latent_dim)
Sampling is stochastic: multiple calls with same μ,σ give different z
Gradients flow correctly (check with z.requires_grad == True)
Reconstruction quality improves during training

Test Your Implementation:

python

# Test: z should be different each time
mu = torch.zeros(1, 20)
log_var = torch.zeros(1, 20)
z1 = model.reparameterize(mu, log_var)
z2 = model.reparameterize(mu, log_var)
assert not torch.allclose(z1, z2), "z should be stochastic!"
print("✅ Reparameterization trick working!")

TODO 2: Implement ELBO Loss (Hard)

Location: Section 5 - "Loss Function"

Your Task: Implement the Evidence Lower Bound (ELBO) loss for VAE training:

perl

ELBO = E[log p(x|z)] - KL[q(z|x) || p(z)]
Loss = -ELBO = reconstruction_loss + β * KL_divergence

Components:

Reconstruction Loss: How well can decoder reconstruct input?
- Binary Cross-Entropy: BCE(x, x_reconstructed)
- Averaged over batch and pixels
KL Divergence: How close is q(z|x) to prior p(z) = N(0, I)?
- Closed-form solution: 0.5 * Σ[1 + log_σ² - μ² - σ²]
- Regularizes latent space to be close to standard normal
Beta Weighting: Balance reconstruction vs regularization
- β = 1.0: Standard VAE
- β > 1.0: β-VAE (more disentangled representations)

Starter Code Provided:

python

def vae_loss(x, x_recon, mu, log_var, beta=1.0):
    """
    Compute VAE loss (negative ELBO).

    Args:
        x: Original images (batch_size, 1, 28, 28)
        x_recon: Reconstructed images (batch_size, 1, 28, 28)
        mu: Latent mean (batch_size, latent_dim)
        log_var: Latent log variance (batch_size, latent_dim)
        beta: Weight for KL divergence (default 1.0)

    Returns:
        loss: Total loss (scalar)
        recon_loss: Reconstruction loss (for logging)
        kl_loss: KL divergence (for logging)
    """
    # TODO 2.1: Compute reconstruction loss
    # Hint: Use F.binary_cross_entropy(x_recon, x, reduction='sum')
    # Hint: Divide by batch size
    recon_loss = None

    # TODO 2.2: Compute KL divergence
    # Hint: KL = -0.5 * sum(1 + log_var - mu^2 - exp(log_var))
    # Hint: Sum over latent_dim, average over batch
    kl_loss = None

    # TODO 2.3: Combine losses
    loss = recon_loss + beta * kl_loss

    return loss, recon_loss, kl_loss

Success Criteria:

Reconstruction loss decreases during training (should ``reach < 100``)
KL divergence is positive and converges (typically 5-20)
Total loss decreases steadily
Reconstructed images are recognizable after training
PSNR (Peak Signal-to-Noise Ratio) > 20 dB on test set

TODO 3: Latent Space Traversal (Easy)

Location: Section 8 - "Latent Space Analysis"

Your Task: Visualize what each dimension of the latent space controls by traversing one dimension at a time.

Requirements:

Start with a base latent code z₀ (e.g., all zeros)
For each latent dimension i:
- Vary z_i from -3 to +3 (keeping others fixed)
- Decode and display the resulting images
Create a grid showing 10 images per dimension

Success Criteria: