Practice and reinforce the concepts from Lesson 8
The Challenge: BookVault's PostgreSQL database is experiencing slow book search queries taking 5+ seconds. You need to create indexes to optimize search performance and reduce query time to under 50ms.
Learning Goal: Master PostgreSQL index creation and use EXPLAIN ANALYZE to measure performance improvements.
Create a B-tree index on PostgreSQL and measure the performance improvement using EXPLAIN ANALYZE. Focus on optimizing book title searches.
💡 Tip: For comprehensive PostgreSQL performance theory, see Reference/Database-Theory/PostgreSQL-Performance-Theory.mdx
Use Docker to create a PostgreSQL database with sample data:
# Create and start PostgreSQL container
docker run -d \
--name bookstore-db \
-e POSTGRES_DB=bookstore \
-e POSTGRES_USER=admin \
-e POSTGRES_PASSWORD=password123 \
-p 5432:5432 \
postgres:15
# Connect to PostgreSQL
docker exec -it bookstore-db psql -U admin -d bookstore
Create the books table with sample data:
-- Create books table
CREATE TABLE books (
id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
author VARCHAR(255),
genre VARCHAR(100),
publication_year INTEGER,
price DECIMAL(10,2),
isbn VARCHAR(20),
page_count INTEGER
);
-- Insert 100,000 sample books
INSERT INTO books (title, author, genre, publication_year, price, isbn, page_count)
SELECT
'Book Title ' || i,
'Author ' || (i % 1000 + 1),
CASE (i % 5)
WHEN 0 THEN 'Fiction'
WHEN 1 THEN 'Science'
WHEN 2 THEN 'History'
WHEN 3 THEN 'Biography'
ELSE 'Mystery'
END,
1950 + (i % 70),
9.99 + (i % 50),
'978-' || LPAD(i::text, 10, '0'),
100 + (i % 500)
FROM generate_series(1, 100000) i;
-- Verify data
SELECT COUNT(*) FROM books;
SELECT * FROM books LIMIT 5;
Test book title search performance before creating indexes:
-- Enable timing to measure query duration
\timing on
-- Test query performance without index (this will be slow)
SELECT * FROM books WHERE title = 'Book Title 50000';
-- Use EXPLAIN ANALYZE to analyze query execution
EXPLAIN ANALYZE SELECT * FROM books WHERE title = 'Book Title 50000';
Expected Results (No Index):
Seq Scan on books
Example EXPLAIN output:
Seq Scan on books (cost=0.00..2542.00 rows=1 width=87) (actual time=45.234..89.567 rows=1 loops=1)
Filter: ((title)::text = 'Book Title 50000'::text)
Rows Removed by Filter: 99999
Planning Time: 0.123 ms
Execution Time: 89.590 ms
Create a B-tree index on the title column:
-- Create B-tree index on title column
CREATE INDEX idx_books_title ON books(title);
-- Show all indexes on books table
\d+ books
-- Check index details
SELECT
indexname,
indexdef
FROM pg_indexes
WHERE tablename = 'books';
The B-tree index creates a sorted data structure that allows fast lookups.
Test the same query with the new index:
-- Test query performance WITH index
EXPLAIN ANALYZE SELECT * FROM books WHERE title = 'Book Title 50000';
-- Test multiple queries to see consistent performance
SELECT * FROM books WHERE title = 'Book Title 10000';
SELECT * FROM books WHERE title = 'Book Title 75000';
SELECT * FROM books WHERE title = 'Book Title 99999';
Expected Results (With Index):
Index Scan using idx_books_title
Example EXPLAIN output:
Index Scan using idx_books_title on books (cost=0.29..8.31 rows=1 width=87) (actual time=0.045..0.047 rows=1 loops=1)
Index Cond: ((title)::text = 'Book Title 50000'::text)
Planning Time: 0.234 ms
Execution Time: 0.078 ms
Test how the index performs with different query types:
-- 1. Exact match (uses index efficiently)
EXPLAIN ANALYZE SELECT * FROM books WHERE title = 'Book Title 25000';
-- 2. Pattern matching with LIKE (may not use index)
EXPLAIN ANALYZE SELECT * FROM books WHERE title LIKE 'Book Title 250%';
-- 3. ILIKE case-insensitive search (likely won't use index)
EXPLAIN ANALYZE SELECT * FROM books WHERE title ILIKE 'book title 25000';
-- 4. Range query on indexed column
EXPLAIN ANALYZE SELECT * FROM books WHERE title >= 'Book Title 5' AND title <= 'Book Title 6';
-- 5. Query on non-indexed column (still uses sequential scan)
EXPLAIN ANALYZE SELECT * FROM books WHERE author = 'Author 500';
Notice which queries benefit from the index and which still require sequential scans.
Analyze the index size and usage statistics:
-- Check table and index sizes
SELECT
pg_size_pretty(pg_relation_size('books')) as table_size,
pg_size_pretty(pg_relation_size('idx_books_title')) as index_size,
pg_size_pretty(pg_total_relation_size('books')) as total_size;
-- Get index usage statistics (run some queries first if stats are empty)
SELECT
schemaname,
tablename,
indexname,
idx_scan as index_scans,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
WHERE relname = 'books';
-- Update table statistics for query planner
ANALYZE books;
Test how B-tree indexes excel at range queries:
-- Range query without index would be very slow
-- But B-tree indexes are perfect for range queries
EXPLAIN ANALYZE
SELECT title, author, price
FROM books
WHERE title BETWEEN 'Book Title 1000' AND 'Book Title 2000'
ORDER BY title;
-- Test ORDER BY performance (indexes help with sorting)
EXPLAIN ANALYZE
SELECT title, price
FROM books
ORDER BY title
LIMIT 10;
-- Compare with ordering by non-indexed column
EXPLAIN ANALYZE
SELECT title, price
FROM books
ORDER BY author
LIMIT 10;
B-tree indexes maintain sorted order, making them excellent for range queries and ORDER BY operations.
Document your results:
Metric | Without Index | With Index | Improvement |
---|---|---|---|
Query Time | 200-500ms | 1-5ms | 50-500x faster |
Execution Plan | Seq Scan | Index Scan | Efficient lookup |
Rows Examined | 100,000 | 1 | 100,000x fewer |
Storage Overhead | 0 MB | ~5 MB | Minimal cost |
CREATE INDEX idx_name ON table(column)
creates a B-tree indexWHERE column = value
WHERE column BETWEEN x AND y
ORDER BY column
WHERE column LIKE 'prefix%'
>, <, >=, <=
ILIKE
)LIKE '%suffix'
)You've successfully created a PostgreSQL B-tree index and measured its dramatic performance impact. You've seen how proper indexing transforms slow sequential scans into fast index scans, reducing query times from hundreds of milliseconds to just a few milliseconds.
📚 Theory Reference: For advanced PostgreSQL performance concepts, Docker deployment strategies, and comprehensive optimization techniques, see Reference/Database-Theory/PostgreSQL-Performance-Theory.mdx
Complete this activity and submit your work through the Activity Submission Form