ℹ️ Definition Generative Models are machine learning models that learn the underlying probability distribution of data to generate new, realistic samples that resemble the training data, enabling applications like image synthesis, text generation, and content creation.
By the end of this lesson, you will:
In Lessons 1-8, we learned Reinforcement Learning - teaching agents to make decisions through trial and error. Now we shift to a different paradigm: Generative AI - teaching models to create new content.
Examples of Generative AI:
These systems generate new content that resembles their training data but is novel.
Goal: Learn the boundary between classes
Examples:
Model: P(y|x) - Probability of label y given input x
Analogy: A discriminative model is like a judge who can tell real paintings from fakes, but can't create paintings.
Goal: Learn the distribution of data to generate new samples
Examples:
Model: P(x) - Probability distribution of data x
Analogy: A generative model is like an artist who can create new paintings that look real.
| Aspect | Discriminative | Generative |
|---|---|---|
| Learns | Decision boundary | Data distribution |
| Model | P(y|x) | P(x) or P(x,y) |
| Task | Classification, regression | Generation, sampling |
| Example | "Is this a dog?" | "Generate a dog image" |
| Difficulty | Easier (lower-dimensional) | Harder (high-dimensional) |
Can do both:

Generative models learn: P(x) = probability distribution over data
Example - 1D Data (Heights):
Heights ~ N(μ=170cm, σ=10cm)
P(height=180cm) = [probability density]
To generate new samples: sample from N(170, 10)
Example - High-Dimensional (Images):
Images: 256×256 RGB = 196,608 dimensions
P(x) over all possible images is intractably complex!
Generative models approximate P(x) with neural networks
Sampling: Drawing new data points from learned distribution
# Simple example: Gaussian distribution
mu, sigma = 0, 1
sample = np.random.normal(mu, sigma) # Generate new data point
# Generative model: Complex distribution
latent_code = np.random.randn(128) # Random noise
generated_image = generator(latent_code) # Generate image
Likelihood: How probable is observed data under our model?
Likelihood = P(x | model parameters)
High likelihood: Model explains data well
Low likelihood: Model doesn't fit data
Training objective: Maximize likelihood of training data
Latent space: Low-dimensional representation of high-dimensional data
Intuition:
Example:
Latent code [0.5, 0.2, -0.3, ...] → Image of "young man with beard"
Latent code [0.6, 0.3, -0.2, ...] → Image of "young woman with smile"
↑
Smooth interpolation possible!
Operations in latent space:
man_with_glasses = man + glasses - face
woman_with_glasses = woman + glasses - face
This arithmetic works because latent space has structure!

Explicitly model P(x)
Advantages:
Disadvantages:
Examples:
Autoregressive Models (PixelCNN, GPT)
Variational Autoencoders (VAEs)
Normalizing Flows
Don't explicitly model P(x), just generate samples
Advantages:
Disadvantages:
Examples:
Generative Adversarial Networks (GANs)
Diffusion Models

Simple generative model: Data comes from mixture of Gaussians
P(x) = Σ_k π_k * N(x | μ_k, Σ_k)
Where:
Generation:
Training: Expectation-Maximization (EM) algorithm
Architecture:
Encoder: x → z (infer latent code from data)
Decoder: z → x̂ (reconstruct data from latent code)
Training objective:
ELBO = E_q[log p(x|z)] - KL(q(z|x) || p(z))
↑ ↑
Reconstruction Regularization
Generation:
Architecture:
Generator: z → x (create fake samples)
Discriminator: x → [0,1] (judge real vs fake)
Training: Adversarial game
Generator: Fool discriminator (generate realistic samples)
Discriminator: Distinguish real from fake
min_G max_D E[log D(x)] + E[log(1 - D(G(z)))]
Idea: Gradually add noise to data, then learn to reverse the process
Forward process (diffusion):
x₀ → x₁ → x₂ → ... → x_T ~ N(0,I)
Reverse process (generation):
x_T ~ N(0,I) → x_{T-1} → ... → x₁ → x₀
Generation: Start from noise, iteratively denoise
Idea: Model P(x) as product of conditional probabilities
P(x₁, x₂, ..., x_n) = P(x₁) * P(x₂|x₁) * P(x₃|x₁,x₂) * ...
Generation: Sequential sampling
Sample x₁ ~ P(x₁)
Sample x₂ ~ P(x₂|x₁)
Sample x₃ ~ P(x₃|x₁,x₂)
...
Examples: GPT (text), PixelCNN (images)
One. Inception Score (IS)
2. Fréchet Inception Distance (FID)
3. Human Evaluation
One. Perplexity
2. BLEU/ROUGE/METEOR
3. Human Evaluation
One. Likelihood (if computable)
2. Sample Quality
3. Interpolation
Image Synthesis:
Image-to-Image Translation:
Image Editing:
Text Generation:
Translation:
Summarization:
Speech Synthesis:
Music Generation:
Drug Discovery:
Protein Design:
Materials Science:
Problem: Model generates limited variety of samples
Example:
Solutions:
Problem: Training diverges or oscillates
Example:
Solutions:
Problem: Hard to measure generation quality
Challenge:
Solutions:
Problem: Training generative models is expensive
Example:
Solutions:
One. Deepfakes
2. Copyright and Ownership
3. Bias and Fairness
4. Environmental Impact
One. Watermarking
2. Attribution
3. Bias Mitigation
4. Regulation
import numpy as np
from scipy.stats import multivariate_normal
class GMM:
def __init__(self, n_components=3):
self.n_components = n_components
def fit(self, X, n_iterations=100):
n_samples, n_features = X.shape
# Initialize parameters randomly
self.weights = np.ones(self.n_components) / self.n_components
self.means = X[np.random.choice(n_samples, self.n_components, replace=False)]
self.covariances = [np.eye(n_features) for _ in range(self.n_components)]
# EM algorithm
for iteration in range(n_iterations):
# E-step: Compute responsibilities
responsibilities = self._e_step(X)
# M-step: Update parameters
self._m_step(X, responsibilities)
def _e_step(self, X):
# Compute probability of each point under each component
responsibilities = np.zeros((X.shape[0], self.n_components))
for k in range(self.n_components):
responsibilities[:, k] = self.weights[k] * \
multivariate_normal.pdf(X, self.means[k], self.covariances[k])
# Normalize
responsibilities /= responsibilities.sum(axis=1, keepdims=True)
return responsibilities
def _m_step(self, X, responsibilities):
# Update weights, means, covariances
N_k = responsibilities.sum(axis=0)
self.weights = N_k / X.shape[0]
self.means = (responsibilities.T @ X) / N_k[:, np.newaxis]
for k in range(self.n_components):
diff = X - self.means[k]
self.covariances[k] = (responsibilities[:, k, np.newaxis] * diff).T @ diff / N_k[k]
def sample(self, n_samples=1):
# Generate new samples
samples = []
for _ in range(n_samples):
# Choose component
k = np.random.choice(self.n_components, p=self.weights)
# Sample from component
sample = np.random.multivariate_normal(self.means[k], self.covariances[k])
samples.append(sample)
return np.array(samples)
# Example usage
X = np.random.randn(1000, 2) # Training data
gmm = GMM(n_components=3)
gmm.fit(X)
# Generate new samples
generated_samples = gmm.sample(n_samples=100)
In the next lessons, we'll dive deep into specific generative models:
Get ready to build state-of-the-art generative AI systems!