Practice and reinforce the concepts from Lesson 9
Welcome to FoodRush, the hottest new food delivery startup! You've been hired as the lead database architect to design their data model from scratch. The company is growing fast, and they need a robust, scalable database design.
Design a complete database system for a food delivery platform, starting simple and progressively adding complexity. You'll apply normalization principles to ensure data integrity and efficiency.
The founders have been tracking everything in one massive spreadsheet:
-- The current "database" (a single messy table)
CREATE TABLE FoodRushData (
order_id INT,
customer_name VARCHAR(100),
customer_phone VARCHAR(20),
customer_address TEXT,
restaurant_name VARCHAR(100),
restaurant_address TEXT,
restaurant_cuisine VARCHAR(50),
driver_name VARCHAR(100),
driver_phone VARCHAR(20),
driver_vehicle VARCHAR(50),
order_items TEXT,
item_prices TEXT,
order_total DECIMAL(10,2),
order_date DATE,
delivery_time TIME,
payment_method VARCHAR(50),
special_instructions TEXT
);
-- Sample data showing the problems
INSERT INTO FoodRushData VALUES
(1, 'Alice Chen', '555-1111', '789 Pine St', 'Pizza Palace', '123 Main St', 'Italian',
'Bob Driver', '555-2222', 'Honda Civic', 'Large Pizza, Garlic Bread', '18.99, 6.99',
25.98, '2024-03-20', '19:30', 'Credit Card', 'Extra cheese'),
(2, 'Alice Chen', '555-1111', '789 Pine St', 'Burger Barn', '456 Oak Ave', 'American',
'Carol Driver', '555-3333', 'Toyota Camry', 'Double Burger, Fries, Coke', '12.99, 4.99, 2.99',
20.97, '2024-03-21', '12:15', 'Cash', 'No onions'),
(3, 'David Kim', '555-4444', '321 Elm Dr', 'Pizza Palace', '123 Main St', 'Italian',
'Bob Driver', '555-2222', 'Honda Civic', 'Medium Pizza', '14.99',
14.99, '2024-03-21', '18:45', 'Credit Card', 'Well done');
Look at the messy table above and identify at least 5 problems with this design.
Start simple! Design an ER diagram with just these three entities:
Your Task: Define the entities with their attributes and relationships.
# Diagram converted from Mermaid to YAML for better CSS control
type: entity-relationship
entities:
- name: CUSTOMER
attributes:
- type: int
name: customer_id
key: primary
- type: string
name: name
- type: string
name: phone
- type: string
name: email
- name: RESTAURANT
attributes:
- type: int
name: restaurant_id
key: primary
- type: string
name: name
- type: string
name: cuisine_type
- type: string
name: phone
- name: ORDER
attributes:
- type: int
name: order_id
key: primary
- type: int
name: customer_id
key: foreign
- type: int
name: restaurant_id
key: foreign
- type: datetime
name: order_date
- type: decimal
name: total_amount
- type: string
name: status
relationships:
- from: CUSTOMER
to: ORDER
type: one-to-many
label: places
- from: RESTAURANT
to: ORDER
type: one-to-many
label: receives
The current design violates 1NF because it has multi-valued attributes (order_items, item_prices).
Your Task: Redesign to achieve 1NF by eliminating repeating groups.
Create a separate OrderItems table:
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
customer_id INT,
restaurant_id INT,
order_date DATETIME,
total_amount DECIMAL(10,2),
status VARCHAR(50)
);
CREATE TABLE OrderItems (
order_item_id INT PRIMARY KEY,
order_id INT,
item_name VARCHAR(100),
quantity INT,
unit_price DECIMAL(10,2),
FOREIGN KEY (order_id) REFERENCES Orders(order_id)
);
Now add these entities to your design:
Think about the relationships between all entities.
# Diagram converted from Mermaid to YAML for better CSS control
type: entity-relationship
entities:
- name: CUSTOMER
attributes:
- type: int
name: customer_id
key: primary
- type: string
name: name
- type: string
name: phone
- type: string
name: email
- type: string
name: default_address
- name: RESTAURANT
attributes:
- type: int
name: restaurant_id
key: primary
- type: string
name: name
- type: string
name: cuisine_type
- type: string
name: phone
- type: string
name: address
- type: time
name: opening_time
- type: time
name: closing_time
- name: DRIVER
attributes:
- type: int
name: driver_id
key: primary
- type: string
name: name
- type: string
name: phone
- type: string
name: vehicle_type
- type: string
name: license_plate
- name: MENU_ITEM
attributes:
- type: int
name: item_id
key: primary
- type: int
name: restaurant_id
key: foreign
- type: string
name: item_name
- type: string
name: description
- type: decimal
name: price
- type: string
name: category
- name: ORDER
attributes:
- type: int
name: order_id
key: primary
- type: int
name: customer_id
key: foreign
- type: int
name: restaurant_id
key: foreign
- type: datetime
name: order_date
- type: decimal
name: total_amount
- type: string
name: status
- type: string
name: special_instructions
- name: ORDER_ITEM
attributes:
- type: int
name: order_item_id
key: primary
- type: int
name: order_id
key: foreign
- type: int
name: item_id
key: foreign
- type: int
name: quantity
- type: decimal
name: subtotal
- name: DELIVERY
attributes:
- type: int
name: delivery_id
key: primary
- type: int
name: order_id
key: foreign
- type: int
name: driver_id
key: foreign
- type: datetime
name: pickup_time
- type: datetime
name: delivery_time
- type: string
name: delivery_address
- type: string
name: status
relationships:
- from: CUSTOMER
to: ORDER
type: one-to-many
label: places
- from: RESTAURANT
to: ORDER
type: one-to-many
label: receives
- from: RESTAURANT
to: MENU_ITEM
type: one-to-many
label: offers
- from: ORDER
to: ORDER_ITEM
type: one-to-many
label: contains
- from: MENU_ITEM
to: ORDER_ITEM
type: one-to-many
label: ordered in
- from: ORDER
to: DELIVERY
type: one-to-one
label: has
- from: DRIVER
to: DELIVERY
type: one-to-many
label: handles
Review your design. Are there any partial dependencies on composite keys?
Your Task: Ensure all non-key attributes depend on the entire primary key.
Our design is already in 2NF because:
If we had used composite keys (e.g., order_id + item_id), we'd need to check for partial dependencies.
Look for transitive dependencies in your design.
Example Problem: If we stored delivery_fee
in the Order table based on distance
, but distance
is calculated from customer and restaurant addresses, this creates a transitive dependency.
Your Task: Eliminate any transitive dependencies.
Separate calculated or derived data:
-- Instead of storing delivery_fee in Orders
-- Create a delivery pricing table
CREATE TABLE DeliveryZones (
zone_id INT PRIMARY KEY,
zone_name VARCHAR(50),
base_fee DECIMAL(10,2),
per_km_rate DECIMAL(10,2)
);
-- Link deliveries to zones
ALTER TABLE Delivery
ADD COLUMN zone_id INT,
ADD COLUMN distance_km DECIMAL(10,2),
ADD FOREIGN KEY (zone_id) REFERENCES DeliveryZones(zone_id);
Enhance your model with these real-world features:
-- Multiple addresses per customer
CREATE TABLE CustomerAddresses (
address_id INT PRIMARY KEY,
customer_id INT,
address_type VARCHAR(50), -- 'Home', 'Work', etc.
street_address VARCHAR(200),
city VARCHAR(100),
postal_code VARCHAR(20),
is_default BOOLEAN,
FOREIGN KEY (customer_id) REFERENCES Customer(customer_id)
);
-- Restaurant ratings
CREATE TABLE RestaurantReviews (
review_id INT PRIMARY KEY,
restaurant_id INT,
customer_id INT,
order_id INT,
rating INT CHECK (rating BETWEEN 1 AND 5),
comment TEXT,
review_date DATETIME,
FOREIGN KEY (restaurant_id) REFERENCES Restaurant(restaurant_id),
FOREIGN KEY (customer_id) REFERENCES Customer(customer_id),
FOREIGN KEY (order_id) REFERENCES Orders(order_id)
);
-- Driver ratings
CREATE TABLE DriverReviews (
review_id INT PRIMARY KEY,
driver_id INT,
customer_id INT,
delivery_id INT,
rating INT CHECK (rating BETWEEN 1 AND 5),
comment TEXT,
review_date DATETIME,
FOREIGN KEY (driver_id) REFERENCES Driver(driver_id),
FOREIGN KEY (customer_id) REFERENCES Customer(customer_id),
FOREIGN KEY (delivery_id) REFERENCES Delivery(delivery_id)
);
-- Promotions
CREATE TABLE Promotions (
promo_id INT PRIMARY KEY,
promo_code VARCHAR(50) UNIQUE,
description TEXT,
discount_percent DECIMAL(5,2),
discount_amount DECIMAL(10,2),
min_order_amount DECIMAL(10,2),
valid_from DATE,
valid_until DATE
);
-- Applied promotions
CREATE TABLE OrderPromotions (
order_id INT,
promo_id INT,
discount_applied DECIMAL(10,2),
PRIMARY KEY (order_id, promo_id),
FOREIGN KEY (order_id) REFERENCES Orders(order_id),
FOREIGN KEY (promo_id) REFERENCES Promotions(promo_id)
);
Write SQL queries for these business questions:
-- 1. Top 5 most ordered items
SELECT
mi.item_name,
r.name as restaurant_name,
COUNT(oi.order_item_id) as times_ordered,
SUM(oi.quantity) as total_quantity
FROM MenuItems mi
JOIN OrderItems oi ON mi.item_id = oi.item_id
JOIN Restaurant r ON mi.restaurant_id = r.restaurant_id
GROUP BY mi.item_id, mi.item_name, r.name
ORDER BY times_ordered DESC
LIMIT 5;
-- 2. Average delivery time by driver
SELECT
d.name as driver_name,
COUNT(del.delivery_id) as total_deliveries,
AVG(TIMESTAMPDIFF(MINUTE, del.pickup_time, del.delivery_time)) as avg_delivery_minutes
FROM Driver d
JOIN Delivery del ON d.driver_id = del.driver_id
WHERE del.delivery_time IS NOT NULL
GROUP BY d.driver_id, d.name
ORDER BY avg_delivery_minutes;
-- 3. Inactive customers (30 days)
SELECT
c.customer_id,
c.name,
c.email,
MAX(o.order_date) as last_order_date,
DATEDIFF(CURDATE(), MAX(o.order_date)) as days_since_last_order
FROM Customer c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.email
HAVING days_since_last_order > 30 OR last_order_date IS NULL;
-- 4. High-rated restaurants
SELECT
r.name,
r.cuisine_type,
AVG(rr.rating) as avg_rating,
COUNT(rr.review_id) as review_count
FROM Restaurant r
JOIN RestaurantReviews rr ON r.restaurant_id = rr.restaurant_id
GROUP BY r.restaurant_id, r.name, r.cuisine_type
HAVING avg_rating > 4.5
ORDER BY avg_rating DESC;
-- 5. Revenue by cuisine type
SELECT
r.cuisine_type,
COUNT(DISTINCT r.restaurant_id) as restaurant_count,
COUNT(o.order_id) as total_orders,
SUM(o.total_amount) as total_revenue,
AVG(o.total_amount) as avg_order_value
FROM Restaurant r
JOIN Orders o ON r.restaurant_id = o.restaurant_id
WHERE o.status = 'completed'
GROUP BY r.cuisine_type
ORDER BY total_revenue DESC;
Why is normalization important? Think about what would happen as FoodRush scales to millions of orders.
When might denormalization be useful? Consider query performance for frequently accessed data.
How would you handle real-time tracking? What additional tables or modifications would you need?
What indexes would you create? Which queries would benefit most from indexing?
Design a data warehouse schema (star or snowflake) for FoodRush's analytics team. Consider:
-- Fact table
CREATE TABLE FactOrders (
order_key INT PRIMARY KEY,
date_key INT,
customer_key INT,
restaurant_key INT,
driver_key INT,
order_amount DECIMAL(10,2),
delivery_fee DECIMAL(10,2),
discount_amount DECIMAL(10,2),
delivery_time_minutes INT,
items_count INT,
FOREIGN KEY (date_key) REFERENCES DimDate(date_key),
FOREIGN KEY (customer_key) REFERENCES DimCustomer(customer_key),
FOREIGN KEY (restaurant_key) REFERENCES DimRestaurant(restaurant_key),
FOREIGN KEY (driver_key) REFERENCES DimDriver(driver_key)
);
-- Dimension tables
CREATE TABLE DimDate (
date_key INT PRIMARY KEY,
date DATE,
day_of_week VARCHAR(20),
month VARCHAR(20),
quarter INT,
year INT,
is_weekend BOOLEAN,
is_holiday BOOLEAN
);
CREATE TABLE DimCustomer (
customer_key INT PRIMARY KEY,
customer_id INT,
name VARCHAR(100),
registration_date DATE,
customer_segment VARCHAR(50),
lifetime_value_tier VARCHAR(20)
);
-- And similar dimension tables for Restaurant and Driver
You've successfully transformed a chaotic spreadsheet into a properly normalized database design for a food delivery system. You've applied:
Your FoodRush database is now ready to scale! 🚀