Design Twitter: System Design Interview Walkthrough

"Design Twitter" is one of the most common system design interview questions, and for good reason: it packs read-heavy scale, a skewed social graph, and a genuine architectural fork — how to assemble each user's home timeline — into a single problem. Below is a structured walkthrough you can mirror in a real interview, narrating which stage you're in as you go. If you want the broader template behind it, keep the system design interview cheat sheet open alongside.

1. Clarify the requirements

Never start drawing boxes. Spend the first few minutes scoping the problem so you optimize the right thing.

Functional requirements

A user can post a tweet (text, optionally with media).
A user can follow other users.
A user's home timeline shows recent tweets from people they follow, newest-first (or ranked).
A user can like a tweet.

Non-functional requirements

Read-heavy: timelines are viewed far more often than tweets are written.
Low-latency timeline: the home feed should load in well under a second.
High availability: a brief delay in a tweet appearing is fine; downtime is not.
Scale: hundreds of millions of users and a highly skewed follower graph.

I'll explicitly de-scope DMs, search, trends, and notifications for now — calling that out shows judgment without burning time.

2. Capacity estimates (back-of-the-envelope)

Rough numbers justify the architecture; precision isn't the point.

Quantity	Assumption	Result
Daily active users	~200M DAU	200M
Tweets per day	~2 tweets/user/day average	~400M/day ≈ 5K writes/sec
Timeline reads per day	~50 timeline loads/user/day	~10B/day ≈ 115K reads/sec
Read:write ratio	reads dominate	~20:1+
Tweet storage	~300 bytes/tweet × 400M/day	~120 GB/day (text only)

The headline takeaway: this is a read-heavy system with reads outpacing writes by more than an order of magnitude. That single fact pushes us to precompute timelines so reads are cheap — which leads straight to the core design decision in section 6. Media (images, video) dwarfs text storage, so it belongs in object storage, not the database.

3. API design

A small REST surface covers the requirements. The home-timeline read is the hot path and must stay cheap.

POST /v1/tweets            { "text": "...", "mediaIds?": [...] }
POST /v1/follow            { "targetUserId": "..." }
POST /v1/tweets/{id}/like
GET  /v1/feed?cursor=...    -> page of timeline tweets (cursor-paginated)

Cursor-based pagination (rather than offset) keeps the feed read stable and efficient as new tweets arrive at the top.

4. Data model

Three core entities, plus likes. Tweet IDs come from a distributed, time-sortable generator (e.g., Snowflake) so they sort chronologically without a central counter.

Entity	Key fields	Store / partitioning
User	userId, handle, displayName, createdAt	Relational/KV, sharded by userId
Tweet	tweetId, authorId, text, mediaRefs, createdAt	Wide-column/KV, sharded by tweetId
Follow	followerId, followeeId, createdAt	Graph/KV, indexed both directions
Like	tweetId, userId, createdAt	KV, sharded by tweetId; counters aggregated

The Follow table must answer two queries fast: "who does X follow?" (to build a timeline) and "who follows X?" (to fan out a tweet). Index both directions.

5. High-level architecture

Stateless services behind a gateway, with the write and read paths separated:

Tweet service — validates and persists new tweets to the tweet store, then emits a "new tweet" event.
Fan-out service — consumes those events from a queue and distributes tweet IDs into followers' timelines.
Timeline service — serves GET /feed by reading a user's precomputed timeline cache and hydrating tweet bodies.
Timeline cache — a per-user Redis list of recent tweet IDs.
Media & CDN — uploads go to blob storage; URLs are served from a CDN.

6. The core problem: timeline generation

This is where the interview is won or lost. How do we build a user's home timeline from the people they follow?

Fan-out on write (push). When a user tweets, the fan-out service immediately writes that tweet ID into every follower's timeline cache. Reads are then trivial — just return the precomputed list. The cost is write amplification: a user with millions of followers triggers millions of writes per tweet.

Fan-out on read (pull). Store each tweet once. At read time, the timeline service gathers recent tweets from everyone the user follows and merges them. Writes are cheap, but reads are expensive and slow for users who follow thousands of accounts.

	Fan-out on write (push)	Fan-out on read (pull)
Read cost	Very low (precomputed)	High (merge at read time)
Write cost	High (one write per follower)	Very low (single write)
Best for	Normal users (read-heavy)	Celebrities / inactive users
Weakness	Celebrity write storm	Slow, expensive reads

The celebrity problem and the hybrid. Pure push breaks for accounts with tens of millions of followers — one tweet becomes a write storm. Pure pull punishes ordinary users with slow reads. The production answer is a hybrid: push tweets for normal accounts, but for a small set of celebrity accounts, skip fan-out and pull their tweets at read time, merging them into the user's precomputed timeline. So a home-timeline read = "precomputed list (push)" + "recent tweets from the handful of celebrities I follow (pull)," merged and sorted. Naming the celebrity problem and landing on this hybrid is the single highest-signal moment in the interview.

7. Caching and media

Caching is what makes the read path cheap at 100K+ reads/sec:

Timeline cache: per-user lists of recent tweet IDs in Redis, bounded to the latest few hundred entries so memory stays predictable.
Tweet (object) cache: hot tweet bodies cached so hydration doesn't hit the tweet store every read.
Media via blob + CDN: images and video live in object storage and are served from a CDN at the edge, so heavy bytes never touch your app servers on the read path.

8. Bottlenecks and trade-offs

Push vs pull vs hybrid. Push gives fast reads but expensive writes; pull gives cheap writes but slow reads. Always land on the hybrid and explain the celebrity threshold.
Asynchronous fan-out. Run fan-out through a message queue so posting returns instantly and delivery happens in the background, smoothing out write spikes.
Eventual consistency. A tweet taking a few seconds to appear in followers' feeds is acceptable, and that relaxation is exactly what makes async fan-out possible at scale.
Sharding hotspots. Shard timelines by userId and tweets by tweetId; watch for hot shards around viral tweets and replicate read-heavy keys.

Framework reminder: Design Twitter follows the same arc as every system design answer — requirements → estimates → API → data model → high-level design → the core decision → trade-offs. It's a close cousin of designing a news feed, and the timeline pattern reappears whenever you need to control request volume, much like a rate limiter.

Practice Design Twitter with live AI support

CoPilot Interview surfaces a structured design skeleton — requirements, capacity, API, data model, and the fan-out trade-off — in about 4 seconds during real Zoom and Teams calls. Free for Windows and macOS. Learn more about the AI interview assistant.

Download free

FAQ

How do you generate the home timeline in Design Twitter?

By fanning out tweets. Fan-out on write precomputes each follower's timeline when a tweet is posted, making reads fast but writes expensive. Fan-out on read assembles the timeline at read time, making writes cheap but reads slow. Production Twitter uses a hybrid that pushes for normal users and pulls for celebrities.

What is the celebrity problem when designing Twitter?

With fan-out on write, a tweet from an account with tens of millions of followers triggers that many timeline writes - a write storm. The fix is to pull celebrity tweets at read time and merge them into the precomputed timeline, while still pushing normal users' tweets.

How read-heavy is Twitter?

Heavily read-heavy. Timeline reads outnumber tweet writes by roughly two orders of magnitude, so the design optimizes the read path - which is exactly why fan-out on write (precomputing timelines) is the default for non-celebrity accounts.

What data stores would you use to design Twitter?

A tweet store partitioned by tweet ID, a social graph for follower/followee edges, a per-user timeline cache of recent tweet IDs in Redis, and object storage behind a CDN for media. IDs are generated by a distributed, time-sortable scheme like Snowflake.

Is Twitter strongly consistent?

No - eventual consistency is acceptable. A tweet taking a few seconds to appear in followers' timelines lets fan-out run asynchronously through a queue, which is essential to handle the write load at scale.

Design Twitter — System Design Interview Walkthrough

1. Clarify the requirements

2. Capacity estimates (back-of-the-envelope)

3. API design

4. Data model

5. High-level architecture

6. The core problem: timeline generation

7. Caching and media

8. Bottlenecks and trade-offs

Practice Design Twitter with live AI support

FAQ

Related guides