Demo Mode

No student ID available

Activity 8 of 12

Activity 8: Indexing and Query Optimization

Practice and reinforce the concepts from Lesson 8

Activity 8: PostgreSQL Index Creation and Query Optimization

🌍 Real-World Scenario

The Challenge: BookVault's PostgreSQL database is experiencing slow book search queries taking 5+ seconds. You need to create indexes to optimize search performance and reduce query time to under 50ms.

Learning Goal: Master PostgreSQL index creation and use EXPLAIN ANALYZE to measure performance improvements.

🎯 Your Mission

Create a B-tree index on PostgreSQL and measure the performance improvement using EXPLAIN ANALYZE. Focus on optimizing book title searches.

📋 What You'll Learn

How to create B-tree indexes in PostgreSQL
Using EXPLAIN ANALYZE to measure query performance
Understanding Sequential Scan vs Index Scan operations

💡 Tip: For comprehensive PostgreSQL performance theory, see Reference/Database-Theory/PostgreSQL-Performance-Theory.mdx

🛠️ Step One: Set Up PostgreSQL Test Environment (10 min)

Use Docker to create a PostgreSQL database with sample data:

bash

# Create and start PostgreSQL container
docker run -d \
  --name bookstore-db \
  -e POSTGRES_DB=bookstore \
  -e POSTGRES_USER=admin \
  -e POSTGRES_PASSWORD=password123 \
  -p 5432:5432 \
  postgres:15

# Connect to PostgreSQL
docker exec -it bookstore-db psql -U admin -d bookstore

Create the books table with sample data:

sql

-- Create books table
CREATE TABLE books (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author VARCHAR(255),
    genre VARCHAR(100),
    publication_year INTEGER,
    price DECIMAL(10,2),
    isbn VARCHAR(20),
    page_count INTEGER
);

-- Insert 100,000 sample books
INSERT INTO books (title, author, genre, publication_year, price, isbn, page_count)
SELECT 
    'Book Title ' || i,
    'Author ' || (i % 1000 + 1),
    CASE (i % 5) 
        WHEN 0 THEN 'Fiction'
        WHEN 1 THEN 'Science'
        WHEN 2 THEN 'History'
        WHEN 3 THEN 'Biography'
        ELSE 'Mystery'
    END,
    1950 + (i % 70),
    9.99 + (i % 50),
    '978-' || LPAD(i::text, 10, '0'),
    100 + (i % 500)
FROM generate_series(1, 100000) i;

-- Verify data
SELECT COUNT(*) FROM books;
SELECT * FROM books LIMIT 5;

🔍 Step 2: Test Query Performance Without Index (10 min)

Test book title search performance before creating indexes:

sql

-- Enable timing to measure query duration
\timing on

-- Test query performance without index (this will be slow)
SELECT * FROM books WHERE title = 'Book Title 50000';

-- Use EXPLAIN ANALYZE to analyze query execution
EXPLAIN ANALYZE SELECT * FROM books WHERE title = 'Book Title 50000';

Expected Results (No Index):

Query execution time: 200-500ms
Execution plan shows: Seq Scan on books
Rows examined: ~100,000 (full table scan)
Filter removes many rows

Example EXPLAIN output:

sql

Seq Scan on books  (cost=0.00..2542.00 rows=1 width=87) (actual time=45.234..89.567 rows=1 loops=1)
  Filter: ((title)::text = 'Book Title 50000'::text)
  Rows Removed by Filter: 99999
Planning Time: 0.123 ms
Execution Time: 89.590 ms

⚡ Step 3: Create B-tree Index (5 min)

Create a B-tree index on the title column:

sql

-- Create B-tree index on title column
CREATE INDEX idx_books_title ON books(title);

-- Show all indexes on books table
\d+ books

-- Check index details
SELECT 
    indexname,
    indexdef
FROM pg_indexes 
WHERE tablename = 'books';

The B-tree index creates a sorted data structure that allows fast lookups.

📈 Step 4: Measure Performance Improvement (10 min)

Test the same query with the new index:

sql

-- Test query performance WITH index
EXPLAIN ANALYZE SELECT * FROM books WHERE title = 'Book Title 50000';

-- Test multiple queries to see consistent performance
SELECT * FROM books WHERE title = 'Book Title 10000';
SELECT * FROM books WHERE title = 'Book Title 75000';
SELECT * FROM books WHERE title = 'Book Title 99999';

Expected Results (With Index):

Query execution time: 1-5ms (50-500x faster!)
Execution plan shows: Index Scan using idx_books_title
Rows examined: 1 (only the matching row)
Much lower cost values

Example EXPLAIN output:

sql

Index Scan using idx_books_title on books  (cost=0.29..8.31 rows=1 width=87) (actual time=0.045..0.047 rows=1 loops=1)
  Index Cond: ((title)::text = 'Book Title 50000'::text)
Planning Time: 0.234 ms
Execution Time: 0.078 ms

🔄 Step 5: Compare Different Query Patterns (15 min)

Test how the index performs with different query types:

sql

-- 1. Exact match (uses index efficiently)
EXPLAIN ANALYZE SELECT * FROM books WHERE title = 'Book Title 25000';

-- 2. Pattern matching with LIKE (may not use index)
EXPLAIN ANALYZE SELECT * FROM books WHERE title LIKE 'Book Title 250%';

-- 3. ILIKE case-insensitive search (likely won't use index)
EXPLAIN ANALYZE SELECT * FROM books WHERE title ILIKE 'book title 25000';

-- 4. Range query on indexed column
EXPLAIN ANALYZE SELECT * FROM books WHERE title >= 'Book Title 5' AND title <= 'Book Title 6';

-- 5. Query on non-indexed column (still uses sequential scan)
EXPLAIN ANALYZE SELECT * FROM books WHERE author = 'Author 500';

Notice which queries benefit from the index and which still require sequential scans.

📊 Step 6: Check Index Storage and Statistics (5 min)

Analyze the index size and usage statistics:

sql

-- Check table and index sizes
SELECT 
    pg_size_pretty(pg_relation_size('books')) as table_size,
    pg_size_pretty(pg_relation_size('idx_books_title')) as index_size,
    pg_size_pretty(pg_total_relation_size('books')) as total_size;

-- Get index usage statistics (run some queries first if stats are empty)
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan as index_scans,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes 
WHERE relname = 'books';

-- Update table statistics for query planner
ANALYZE books;

🎯 Step 7: Test Range Queries (10 min)

Test how B-tree indexes excel at range queries:

sql

-- Range query without index would be very slow
-- But B-tree indexes are perfect for range queries
EXPLAIN ANALYZE 
SELECT title, author, price 
FROM books 
WHERE title BETWEEN 'Book Title 1000' AND 'Book Title 2000'
ORDER BY title;

-- Test ORDER BY performance (indexes help with sorting)
EXPLAIN ANALYZE 
SELECT title, price 
FROM books 
ORDER BY title 
LIMIT 10;

-- Compare with ordering by non-indexed column
EXPLAIN ANALYZE 
SELECT title, price 
FROM books 
ORDER BY author 
LIMIT 10;

B-tree indexes maintain sorted order, making them excellent for range queries and ORDER BY operations.

🏆 Performance Results Summary

Document your results:

Metric	Without Index	With Index	Improvement
Query Time	200-500ms	1-5ms	50-500x faster
Execution Plan	Seq Scan	Index Scan	Efficient lookup
Rows Examined	100,000	1	100,000x fewer
Storage Overhead	0 MB	~5 MB	Minimal cost

🔍 Key Takeaways

Index Creation: CREATE INDEX idx_name ON table(column) creates a B-tree index
Performance Impact: Indexes can improve query speed by 50-500x
EXPLAIN ANALYZE: Essential tool for analyzing PostgreSQL query performance
Index Benefits: B-tree indexes excel at equality, range queries, and sorting
Trade-offs: Indexes use storage space but provide massive performance gains

💡 When to Use B-tree Indexes

Equality queries: WHERE column = value
Range queries: WHERE column BETWEEN x AND y
Sorting: ORDER BY column
Pattern matching: WHERE column LIKE 'prefix%'
Comparison operators: >, <, >=, <=

❌ When B-tree Indexes Don't Help

Case-insensitive searches (ILIKE)
Leading wildcards (LIKE '%suffix')
Complex expressions without expression indexes
Very small tables (overhead > benefit)

✅ Activity Complete

You've successfully created a PostgreSQL B-tree index and measured its dramatic performance impact. You've seen how proper indexing transforms slow sequential scans into fast index scans, reducing query times from hundreds of milliseconds to just a few milliseconds.

📚 Theory Reference: For advanced PostgreSQL performance concepts, Docker deployment strategies, and comprehensive optimization techniques, see Reference/Database-Theory/PostgreSQL-Performance-Theory.mdx

📤 Submit Your Work

Complete this activity and submit your work through the Activity Submission Form

Activity 8 of 12

Activity 8: Indexing and Query Optimization

Practice and reinforce the concepts from Lesson 8

Activity 8: PostgreSQL Index Creation and Query Optimization

How to create B-tree indexes in PostgreSQL
Using EXPLAIN ANALYZE to measure query performance
Understanding Sequential Scan vs Index Scan operations

💡 Tip: For comprehensive PostgreSQL performance theory, see Reference/Database-Theory/PostgreSQL-Performance-Theory.mdx

🛠️ Step One: Set Up PostgreSQL Test Environment (10 min)

Use Docker to create a PostgreSQL database with sample data:

bash

# Create and start PostgreSQL container
docker run -d \
  --name bookstore-db \
  -e POSTGRES_DB=bookstore \
  -e POSTGRES_USER=admin \
  -e POSTGRES_PASSWORD=password123 \
  -p 5432:5432 \
  postgres:15

# Connect to PostgreSQL
docker exec -it bookstore-db psql -U admin -d bookstore

Create the books table with sample data:

sql

-- Create books table
CREATE TABLE books (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author VARCHAR(255),
    genre VARCHAR(100),
    publication_year INTEGER,
    price DECIMAL(10,2),
    isbn VARCHAR(20),
    page_count INTEGER
);

-- Insert 100,000 sample books
INSERT INTO books (title, author, genre, publication_year, price, isbn, page_count)
SELECT 
    'Book Title ' || i,
    'Author ' || (i % 1000 + 1),
    CASE (i % 5) 
        WHEN 0 THEN 'Fiction'
        WHEN 1 THEN 'Science'
        WHEN 2 THEN 'History'
        WHEN 3 THEN 'Biography'
        ELSE 'Mystery'
    END,
    1950 + (i % 70),
    9.99 + (i % 50),
    '978-' || LPAD(i::text, 10, '0'),
    100 + (i % 500)
FROM generate_series(1, 100000) i;

-- Verify data
SELECT COUNT(*) FROM books;
SELECT * FROM books LIMIT 5;

🔍 Step 2: Test Query Performance Without Index (10 min)

Test book title search performance before creating indexes:

sql

-- Enable timing to measure query duration
\timing on

-- Test query performance without index (this will be slow)
SELECT * FROM books WHERE title = 'Book Title 50000';

-- Use EXPLAIN ANALYZE to analyze query execution
EXPLAIN ANALYZE SELECT * FROM books WHERE title = 'Book Title 50000';

Expected Results (No Index):

Query execution time: 200-500ms
Execution plan shows: Seq Scan on books
Rows examined: ~100,000 (full table scan)
Filter removes many rows

Example EXPLAIN output:

sql

Seq Scan on books  (cost=0.00..2542.00 rows=1 width=87) (actual time=45.234..89.567 rows=1 loops=1)
  Filter: ((title)::text = 'Book Title 50000'::text)
  Rows Removed by Filter: 99999
Planning Time: 0.123 ms
Execution Time: 89.590 ms

⚡ Step 3: Create B-tree Index (5 min)

Create a B-tree index on the title column:

sql

-- Create B-tree index on title column
CREATE INDEX idx_books_title ON books(title);

-- Show all indexes on books table
\d+ books

-- Check index details
SELECT 
    indexname,
    indexdef
FROM pg_indexes 
WHERE tablename = 'books';

The B-tree index creates a sorted data structure that allows fast lookups.

📈 Step 4: Measure Performance Improvement (10 min)

Test the same query with the new index:

sql

-- Test query performance WITH index
EXPLAIN ANALYZE SELECT * FROM books WHERE title = 'Book Title 50000';

-- Test multiple queries to see consistent performance
SELECT * FROM books WHERE title = 'Book Title 10000';
SELECT * FROM books WHERE title = 'Book Title 75000';
SELECT * FROM books WHERE title = 'Book Title 99999';

Expected Results (With Index):

Query execution time: 1-5ms (50-500x faster!)
Execution plan shows: Index Scan using idx_books_title
Rows examined: 1 (only the matching row)
Much lower cost values

Example EXPLAIN output:

sql

Index Scan using idx_books_title on books  (cost=0.29..8.31 rows=1 width=87) (actual time=0.045..0.047 rows=1 loops=1)
  Index Cond: ((title)::text = 'Book Title 50000'::text)
Planning Time: 0.234 ms
Execution Time: 0.078 ms

🔄 Step 5: Compare Different Query Patterns (15 min)

Test how the index performs with different query types:

sql

-- 1. Exact match (uses index efficiently)
EXPLAIN ANALYZE SELECT * FROM books WHERE title = 'Book Title 25000';

-- 2. Pattern matching with LIKE (may not use index)
EXPLAIN ANALYZE SELECT * FROM books WHERE title LIKE 'Book Title 250%';

-- 3. ILIKE case-insensitive search (likely won't use index)
EXPLAIN ANALYZE SELECT * FROM books WHERE title ILIKE 'book title 25000';

-- 4. Range query on indexed column
EXPLAIN ANALYZE SELECT * FROM books WHERE title >= 'Book Title 5' AND title <= 'Book Title 6';

-- 5. Query on non-indexed column (still uses sequential scan)
EXPLAIN ANALYZE SELECT * FROM books WHERE author = 'Author 500';

Notice which queries benefit from the index and which still require sequential scans.

📊 Step 6: Check Index Storage and Statistics (5 min)

Analyze the index size and usage statistics:

sql

-- Check table and index sizes
SELECT 
    pg_size_pretty(pg_relation_size('books')) as table_size,
    pg_size_pretty(pg_relation_size('idx_books_title')) as index_size,
    pg_size_pretty(pg_total_relation_size('books')) as total_size;

-- Get index usage statistics (run some queries first if stats are empty)
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan as index_scans,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes 
WHERE relname = 'books';

-- Update table statistics for query planner
ANALYZE books;

🎯 Step 7: Test Range Queries (10 min)

Test how B-tree indexes excel at range queries:

sql

-- Range query without index would be very slow
-- But B-tree indexes are perfect for range queries
EXPLAIN ANALYZE 
SELECT title, author, price 
FROM books 
WHERE title BETWEEN 'Book Title 1000' AND 'Book Title 2000'
ORDER BY title;

-- Test ORDER BY performance (indexes help with sorting)
EXPLAIN ANALYZE 
SELECT title, price 
FROM books 
ORDER BY title 
LIMIT 10;

-- Compare with ordering by non-indexed column
EXPLAIN ANALYZE 
SELECT title, price 
FROM books 
ORDER BY author 
LIMIT 10;

B-tree indexes maintain sorted order, making them excellent for range queries and ORDER BY operations.

🏆 Performance Results Summary

Document your results:

Metric	Without Index	With Index	Improvement
Query Time	200-500ms	1-5ms	50-500x faster
Execution Plan	Seq Scan	Index Scan	Efficient lookup
Rows Examined	100,000	1	100,000x fewer
Storage Overhead	0 MB	~5 MB	Minimal cost

🔍 Key Takeaways

Index Creation: CREATE INDEX idx_name ON table(column) creates a B-tree index
Performance Impact: Indexes can improve query speed by 50-500x
EXPLAIN ANALYZE: Essential tool for analyzing PostgreSQL query performance
Index Benefits: B-tree indexes excel at equality, range queries, and sorting
Trade-offs: Indexes use storage space but provide massive performance gains

💡 When to Use B-tree Indexes

Equality queries: WHERE column = value
Range queries: WHERE column BETWEEN x AND y
Sorting: ORDER BY column
Pattern matching: WHERE column LIKE 'prefix%'
Comparison operators: >, <, >=, <=

❌ When B-tree Indexes Don't Help

Case-insensitive searches (ILIKE)
Leading wildcards (LIKE '%suffix')
Complex expressions without expression indexes
Very small tables (overhead > benefit)

✅ Activity Complete

📚 Theory Reference: For advanced PostgreSQL performance concepts, Docker deployment strategies, and comprehensive optimization techniques, see Reference/Database-Theory/PostgreSQL-Performance-Theory.mdx

📤 Submit Your Work

Complete this activity and submit your work through the Activity Submission Form