Welcome, database architects! You've been hired by a startup building the next big social media platform. They want a feed that shows posts with likes, comments, hashtags, and embedded media - all loading instantly for millions of users.
Your team started with SQL, but something's wrong. The queries are getting slower, the joins are multiplying, and the database is struggling. Time to discover why companies like Facebook and Twitter turned to NoSQL!
By the end of this lesson, you'll be able to:
Your startup, TelebortGram, needs a feed system that can handle:
Let's see why SQL struggles with this!
Let's try building our social feed with SQL first:
-- Our SQL schema attempt
CREATE TABLE posts (
id INT PRIMARY KEY,
user_id INT,
content TEXT,
created_at TIMESTAMP
);
CREATE TABLE comments (
id INT PRIMARY KEY,
post_id INT,
user_id INT,
content TEXT,
parent_comment_id INT -- For nested replies
);
CREATE TABLE likes (
user_id INT,
post_id INT,
PRIMARY KEY (user_id, post_id)
);
CREATE TABLE media (
id INT PRIMARY KEY,
post_id INT,
media_type VARCHAR(10),
url VARCHAR(255)
);
-- Getting a single post with all its data requires...
SELECT p.*, u.username, u.avatar,
COUNT(DISTINCT l.user_id) as like_count,
COUNT(DISTINCT c.id) as comment_count
FROM posts p
JOIN users u ON p.user_id = u.id
LEFT JOIN likes l ON p.id = l.post_id
LEFT JOIN comments c ON p.id = c.post_id
LEFT JOIN media m ON p.id = m.post_id
WHERE p.id = 123
GROUP BY p.id, u.username, u.avatar;
-- And we haven't even loaded the actual comments yet!
Challenge: Try to write a query that loads a post with all comments, nested replies, user info for each commenter, and media. Notice how complex it becomes?
-- Loading a feed of 20 posts with all their data
-- This query would need to join across 6+ tables
-- For each post, potentially hundreds of rows
-- Result: Slow loading feeds that users hate!
NoSQL databases store data differently. Instead of spreading data across tables, they keep related data together:
// A single post in MongoDB (a document database)
{
"_id": "507f1f77bcf86cd799439011",
"user": {
"id": "user123",
"username": "coolcoder",
"avatar": "https://..."
},
"content": "Just built my first app! 🚀",
"media": [
{
"type": "image",
"url": "https://...",
"thumbnail": "https://..."
}
],
"likes": ["user456", "user789", "user321"],
"likeCount": 3,
"comments": [
{
"id": "comment1",
"user": {
"id": "user456",
"username": "bestfriend"
},
"content": "Amazing work!",
"timestamp": "2024-01-15T10:30:00Z",
"replies": [
{
"user": {"id": "user123", "username": "coolcoder"},
"content": "Thanks! 😊",
"timestamp": "2024-01-15T10:35:00Z"
}
]
}
],
"hashtags": ["coding", "firstapp", "mongodb"],
"timestamp": "2024-01-15T10:00:00Z"
}
One query, all data! No joins needed.
Document Stores (MongoDB, CouchDB)
Key-Value Stores (Redis, DynamoDB)
Column-Family Stores (Cassandra, HBase)
Graph Databases (Neo4j, Amazon Neptune)
The CAP theorem states that distributed systems can only guarantee two of:
// Real-world example: Instagram likes
// They chose AP (Availability + Partition Tolerance)
// Result: Like counts might be slightly different
// for different users momentarily, but the app never goes down
Feature | SQL | NoSQL |
---|---|---|
Structure | Fixed schema | Flexible schema |
Scaling | Vertical (bigger server) | Horizontal (more servers) |
ACID | Full ACID guarantees | Eventually consistent |
Queries | Complex queries with joins | Simple queries, denormalized |
Best for | Financial data, inventory | Social media, real-time |
// MongoDB in action
const { MongoClient } = require('mongodb');
// Connect to MongoDB
const client = new MongoClient('mongodb://localhost:27017');
const db = client.db('telebortgram');
const posts = db.collection('posts');
// Creating a post - so much simpler!
async function createPost(userId, content, media) {
const post = {
user: await getUserInfo(userId),
content: content,
media: media || [],
likes: [],
likeCount: 0,
comments: [],
hashtags: extractHashtags(content),
timestamp: new Date()
};
return await posts.insertOne(post);
}
// Getting the feed - one query!
async function getFeed(userId, limit = 20) {
return await posts
.find({ "user.followers": userId })
.sort({ timestamp: -1 })
.limit(limit)
.toArray();
}
// Real-time like updates
async function likePost(postId, userId) {
return await posts.updateOne(
{ _id: postId },
{
$addToSet: { likes: userId },
$inc: { likeCount: 1 }
}
);
}
// Nested comments with replies
async function addComment(postId, userId, content, parentCommentId = null) {
const comment = {
id: generateId(),
user: await getUserInfo(userId),
content: content,
timestamp: new Date(),
replies: []
};
if (parentCommentId) {
// Add as a reply
return await posts.updateOne(
{ _id: postId, "comments.id": parentCommentId },
{ $push: { "comments.$.replies": comment } }
);
} else {
// Add as top-level comment
return await posts.updateOne(
{ _id: postId },
{ $push: { comments: comment } }
);
}
}
Your product manager wants a new feature: "Stories" that disappear after 24 hours, can have multiple media items, reactions (not just likes), and viewing statistics.
Design the document structure for this feature. Consider:
// Start your design here:
const storyDocument = {
// Your structure...
};
You've discovered why social media giants use NoSQL:
Remember: NoSQL isn't "better" than SQL - it's different. The best architects know when to use each one. Now you're ready to build the next viral app!
Ready to make TelebortGram go viral? Let's code!