Practice and reinforce the concepts from Lesson 5
You're part of the FIFA Analytics Team during the World Cup knockout stages. The broadcast is LIVE with 3 billion viewers watching worldwide, and commentators need real-time statistics:
The stakes are high: Broadcast quality affects FIFA's reputation and viewer engagement. Your SQL queries must be accurate, fast, and deliver insights that millions depend on.
Your mission: Master advanced aggregations and window functions to power the world's biggest sporting event analytics.
Our database tracks the FIFA World Cup with four interconnected tables:
-- Teams table - National teams competing
CREATE TABLE teams (
team_id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
country VARCHAR(50) NOT NULL,
group_letter CHAR(1),
fifa_ranking INT
);
-- Players table - Registered players
CREATE TABLE players (
player_id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
team_id INT REFERENCES teams(team_id),
position VARCHAR(20),
jersey_number INT,
age INT
);
-- Matches table - All tournament games
CREATE TABLE matches (
match_id SERIAL PRIMARY KEY,
home_team_id INT REFERENCES teams(team_id),
away_team_id INT REFERENCES teams(team_id),
match_date DATE,
stage VARCHAR(20), -- 'Group', 'Round of 16', 'Quarter', 'Semi', 'Final'
home_score INT DEFAULT 0,
away_score INT DEFAULT 0,
stadium VARCHAR(100)
);
-- Goals table - Individual goal records
CREATE TABLE goals (
goal_id SERIAL PRIMARY KEY,
match_id INT REFERENCES matches(match_id),
player_id INT REFERENCES players(player_id),
team_id INT REFERENCES teams(team_id),
minute INT,
goal_type VARCHAR(20) -- 'Regular', 'Penalty', 'Own Goal', 'Free Kick'
);
-- Insert teams (Group Stage subset)
INSERT INTO teams VALUES
(1, 'Brazil', 'Brazil', 'G', 1),
(2, 'Germany', 'Germany', 'E', 11),
(3, 'Argentina', 'Argentina', 'C', 3),
(4, 'France', 'France', 'D', 4),
(5, 'Spain', 'Spain', 'E', 7),
(6, 'England', 'England', 'B', 5),
(7, 'Portugal', 'Portugal', 'H', 9),
(8, 'Netherlands', 'Netherlands', 'A', 6);
-- Insert players (sample from each team)
INSERT INTO players VALUES
(1, 'Neymar Jr', 1, 'Forward', 10, 31),
(2, 'Casemiro', 1, 'Midfielder', 5, 31),
(3, 'Alisson', 1, 'Goalkeeper', 1, 30),
(4, 'Kai Havertz', 2, 'Forward', 7, 24),
(5, 'Joshua Kimmich', 2, 'Midfielder', 6, 28),
(6, 'Manuel Neuer', 2, 'Goalkeeper', 1, 37),
(7, 'Lionel Messi', 3, 'Forward', 10, 36),
(8, 'Enzo Fernandez', 3, 'Midfielder', 24, 22),
(9, 'Emiliano Martinez', 3, 'Goalkeeper', 23, 31),
(10, 'Kylian Mbappe', 4, 'Forward', 10, 25),
(11, 'Antoine Griezmann', 4, 'Midfielder', 7, 32),
(12, 'Hugo Lloris', 4, 'Goalkeeper', 1, 36),
(13, 'Alvaro Morata', 5, 'Forward', 7, 30),
(14, 'Pedri', 5, 'Midfielder', 8, 21),
(15, 'Unai Simon', 5, 'Goalkeeper', 1, 26),
(16, 'Harry Kane', 6, 'Forward', 9, 30),
(17, 'Jude Bellingham', 6, 'Midfielder', 22, 20),
(18, 'Jordan Pickford', 6, 'Goalkeeper', 1, 29),
(19, 'Cristiano Ronaldo', 7, 'Forward', 7, 38),
(20, 'Bruno Fernandes', 7, 'Midfielder', 8, 29),
(21, 'Diogo Costa', 7, 'Goalkeeper', 22, 24),
(22, 'Memphis Depay', 8, 'Forward', 10, 29),
(23, 'Frenkie de Jong', 8, 'Midfielder', 21, 26),
(24, 'Jasper Cillessen', 8, 'Goalkeeper', 1, 34);
-- Insert matches (Group stage + knockout)
INSERT INTO matches VALUES
(1, 1, 8, '2024-11-20', 'Group', 3, 2, 'Lusail Stadium'),
(2, 2, 5, '2024-11-21', 'Group', 1, 1, 'Al Bayt Stadium'),
(3, 3, 6, '2024-11-22', 'Group', 2, 0, '974 Stadium'),
(4, 4, 7, '2024-11-23', 'Group', 2, 1, 'Education City'),
(5, 1, 5, '2024-11-24', 'Group', 2, 0, 'Khalifa Stadium'),
(6, 2, 8, '2024-11-25', 'Group', 2, 2, 'Al Thumama'),
(7, 3, 7, '2024-11-26', 'Group', 3, 1, 'Ahmad Bin Ali'),
(8, 4, 6, '2024-11-27', 'Group', 1, 2, 'Al Janoub'),
(9, 1, 3, '2024-12-03', 'Round of 16', 4, 2, 'Lusail Stadium'),
(10, 2, 4, '2024-12-04', 'Round of 16', 0, 2, 'Al Bayt Stadium'),
(11, 6, 7, '2024-12-05', 'Round of 16', 3, 0, 'Education City'),
(12, 5, 8, '2024-12-06', 'Round of 16', 1, 1, 'Khalifa Stadium');
-- Insert goals (30+ goal records)
INSERT INTO goals VALUES
-- Match 1: Brazil 3-2 Netherlands
(1, 1, 1, 1, 23, 'Regular'),
(2, 1, 1, 1, 45, 'Penalty'),
(3, 1, 2, 1, 67, 'Regular'),
(4, 1, 22, 8, 34, 'Regular'),
(5, 1, 23, 8, 78, 'Free Kick'),
-- Match 2: Germany 1-1 Spain
(6, 2, 4, 2, 56, 'Regular'),
(7, 2, 13, 5, 72, 'Regular'),
-- Match 3: Argentina 2-0 England
(8, 3, 7, 3, 12, 'Regular'),
(9, 3, 7, 3, 88, 'Regular'),
-- Match 4: France 2-1 Portugal
(10, 4, 10, 4, 19, 'Regular'),
(11, 4, 11, 4, 65, 'Regular'),
(12, 4, 19, 7, 41, 'Penalty'),
-- Match 5: Brazil 2-0 Spain
(13, 5, 1, 1, 28, 'Regular'),
(14, 5, 2, 1, 83, 'Regular'),
-- Match 6: Germany 2-2 Netherlands
(15, 6, 4, 2, 15, 'Regular'),
(16, 6, 5, 2, 54, 'Regular'),
(17, 6, 22, 8, 38, 'Regular'),
(18, 6, 22, 8, 90, 'Regular'),
-- Match 7: Argentina 3-1 Portugal
(19, 7, 7, 3, 31, 'Free Kick'),
(20, 7, 8, 3, 55, 'Regular'),
(21, 7, 7, 3, 76, 'Regular'),
(22, 7, 19, 7, 68, 'Regular'),
-- Match 8: France 1-2 England
(23, 8, 10, 4, 22, 'Regular'),
(24, 8, 16, 6, 44, 'Penalty'),
(25, 8, 17, 6, 81, 'Regular'),
-- Match 9: Brazil 4-2 Argentina
(26, 9, 1, 1, 18, 'Regular'),
(27, 9, 1, 1, 35, 'Regular'),
(28, 9, 2, 1, 58, 'Regular'),
(29, 9, 1, 1, 90, 'Regular'),
(30, 9, 7, 3, 27, 'Penalty'),
(31, 9, 7, 3, 72, 'Regular'),
-- Match 10: Germany 0-2 France
(32, 10, 10, 4, 33, 'Regular'),
(33, 10, 11, 4, 79, 'Regular'),
-- Match 11: England 3-0 Portugal
(34, 11, 16, 6, 40, 'Regular'),
(35, 11, 16, 6, 62, 'Regular'),
(36, 11, 17, 6, 85, 'Regular'),
-- Match 12: Spain 1-1 Netherlands
(37, 12, 13, 5, 51, 'Regular'),
(38, 12, 22, 8, 77, 'Regular');
Before diving into advanced techniques, let's review the fundamental aggregate functions:
-- Total goals scored in the tournament
SELECT COUNT(*) as total_goals
FROM goals;
-- Result: 38 goals
-- Average goals per match
SELECT AVG(home_score + away_score) as avg_goals_per_match
FROM matches;
-- Result: ~3.17 goals per match
-- Highest scoring match
SELECT MAX(home_score + away_score) as highest_score
FROM matches;
-- Result: 6 goals (Brazil 4-2 Argentina)
When you need to break down data by multiple dimensions:
-- Goals by team and tournament stage
SELECT
t.name as team,
m.stage,
COUNT(g.goal_id) as goals
FROM goals g
JOIN teams t ON g.team_id = t.team_id
JOIN matches m ON g.match_id = m.match_id
GROUP BY t.name, m.stage
ORDER BY t.name, m.stage;
Expected Output:
Argentina | Group | 5
Argentina | Round of 16 | 2
Brazil | Group | 7
Brazil | Round of 16 | 4
England | Group | 2
England | Round of 16 | 3
...
Use HAVING to filter after aggregation (unlike WHERE which filters before):
-- Teams with 5 or more total goals
SELECT
t.name,
COUNT(g.goal_id) as total_goals
FROM goals g
JOIN teams t ON g.team_id = t.team_id
GROUP BY t.name
HAVING COUNT(g.goal_id) >= 5
ORDER BY total_goals DESC;
Expected Output:
Brazil | 11
Argentina | 7
France | 5
England | 5
Window functions perform calculations across rows without collapsing them into groups. They're essential for rankings, running totals, and comparative analytics.
-- Top scorers with sequential ranking (no ties)
SELECT
p.name,
t.name as team,
COUNT(g.goal_id) as goals,
ROW_NUMBER() OVER (ORDER BY COUNT(g.goal_id) DESC) as rank
FROM goals g
JOIN players p ON g.player_id = p.player_id
JOIN teams t ON p.team_id = t.team_id
GROUP BY p.player_id, p.name, t.name
ORDER BY goals DESC
LIMIT 10;
Expected Output:
Neymar Jr | Brazil | 7 | 1
Lionel Messi | Argentina | 6 | 2
Kylian Mbappe | France | 4 | 3
Harry Kane | England | 3 | 4
Memphis Depay | Netherlands | 3 | 5
...
-- Compare ranking methods with tied scores
SELECT
p.name,
COUNT(g.goal_id) as goals,
RANK() OVER (ORDER BY COUNT(g.goal_id) DESC) as rank,
DENSE_RANK() OVER (ORDER BY COUNT(g.goal_id) DESC) as dense_rank
FROM goals g
JOIN players p ON g.player_id = p.player_id
GROUP BY p.player_id, p.name
ORDER BY goals DESC;
Key Difference:
RANK(): Skips numbers after ties (1, 2, 2, 4, 5)DENSE_RANK(): No gaps in ranking (1, 2, 2, 3, 4)-- Cumulative goals as tournament progresses
SELECT
m.match_date,
COUNT(g.goal_id) as daily_goals,
SUM(COUNT(g.goal_id)) OVER (ORDER BY m.match_date) as cumulative_goals
FROM goals g
JOIN matches m ON g.match_id = m.match_id
GROUP BY m.match_date
ORDER BY m.match_date;
Expected Output:
2024-11-20 | 5 | 5
2024-11-21 | 2 | 7
2024-11-22 | 2 | 9
2024-11-23 | 3 | 12
...
Complete the query to show the top 10 scorers with rankings:
-- TODO: Add the window function to calculate rank
SELECT
p.name as player_name,
t.name as team_name,
COUNT(g.goal_id) as goals
-- TODO: Add ROW_NUMBER() OVER (...) as golden_boot_rank
FROM goals g
JOIN players p ON g.player_id = p.player_id
JOIN teams t ON p.team_id = t.team_id
-- TODO: Add GROUP BY clause
-- TODO: Add ORDER BY to show highest scorers first
LIMIT 10;
Success Criteria:
ROW_NUMBER() for sequential rankingCreate a query that calculates the group stage standings with:
-- TODO: Build the complete standings query
-- Hint: You'll need CASE statements to calculate wins/draws/losses
-- Hint: SUM() both home and away scenarios
-- Hint: Calculate goals_for, goals_against, goal_difference, points
Success Criteria:
Calculate the goals per match for players who have played 2+ matches:
-- TODO: Complete the window function
SELECT
p.name,
t.name as team,
COUNT(DISTINCT g.match_id) as matches_played,
COUNT(g.goal_id) as total_goals
-- TODO: Add window function to calculate goals_per_match_rank
-- Use PARTITION BY position (forwards vs midfielders)
FROM goals g
JOIN players p ON g.player_id = p.player_id
JOIN teams t ON p.team_id = t.team_id
-- TODO: Add appropriate GROUP BY and HAVING clauses
ORDER BY total_goals DESC;
Success Criteria:
PARTITION BYRun your queries and check against these expected outcomes:
1. Neymar Jr (Brazil) - 7 goals
2. Lionel Messi (Argentina) - 6 goals
3. Kylian Mbappe (France) - 4 goals
4. Harry Kane (England) - 3 goals
5. Memphis Depay (Netherlands) - 3 goals
-- Final cumulative should equal total goals
SELECT SUM(home_score + away_score) FROM matches;
-- Should match the last row of running total query: 38 goals
-- Verify RANK vs DENSE_RANK for tied players
SELECT
COUNT(g.goal_id) as goals,
COUNT(DISTINCT RANK() OVER (ORDER BY COUNT(g.goal_id) DESC)) as rank_values,
COUNT(DISTINCT DENSE_RANK() OVER (ORDER BY COUNT(g.goal_id) DESC)) as dense_values
FROM goals g
JOIN players p ON g.player_id = p.player_id
GROUP BY p.player_id;
Window functions can be "reset" for each partition:
-- Best scorer from EACH team
SELECT
p.name,
t.name as team,
COUNT(g.goal_id) as goals,
ROW_NUMBER() OVER (PARTITION BY t.team_id ORDER BY COUNT(g.goal_id) DESC) as team_rank
FROM goals g
JOIN players p ON g.player_id = p.player_id
JOIN teams t ON p.team_id = t.team_id
GROUP BY p.player_id, p.name, t.team_id, t.name
ORDER BY t.name, team_rank;
This shows the top scorer per team rather than globally.
Window functions vs. Subqueries:
-- ❌ SLOW: Correlated subquery
SELECT name,
(SELECT COUNT(*) FROM goals WHERE player_id = p.player_id) as goals
FROM players p;
-- ✅ FAST: Window function
SELECT DISTINCT p.name,
COUNT(g.goal_id) OVER (PARTITION BY p.player_id) as goals
FROM players p
LEFT JOIN goals g ON p.player_id = g.player_id;
Why window functions are faster:
Window functions power many production analytics:
Complete all three TODO queries and submit your work through the Activity Submission Form.
Submission Checklist: