HomeBlog › Design Twitter

Design Twitter — System Design Interview Walkthrough

A senior-level walkthrough: requirements, capacity math, API, data model, and the timeline fan-out decision that the whole interview turns on.

"Design Twitter" is one of the most common system design interview questions, and for good reason: it packs read-heavy scale, a skewed social graph, and a genuine architectural fork — how to assemble each user's home timeline — into a single problem. Below is a structured walkthrough you can mirror in a real interview, narrating which stage you're in as you go. If you want the broader template behind it, keep the system design interview cheat sheet open alongside.

1. Clarify the requirements

Never start drawing boxes. Spend the first few minutes scoping the problem so you optimize the right thing.

Functional requirements

Non-functional requirements

I'll explicitly de-scope DMs, search, trends, and notifications for now — calling that out shows judgment without burning time.

2. Capacity estimates (back-of-the-envelope)

Rough numbers justify the architecture; precision isn't the point.

QuantityAssumptionResult
Daily active users~200M DAU200M
Tweets per day~2 tweets/user/day average~400M/day ≈ 5K writes/sec
Timeline reads per day~50 timeline loads/user/day~10B/day ≈ 115K reads/sec
Read:write ratioreads dominate~20:1+
Tweet storage~300 bytes/tweet × 400M/day~120 GB/day (text only)

The headline takeaway: this is a read-heavy system with reads outpacing writes by more than an order of magnitude. That single fact pushes us to precompute timelines so reads are cheap — which leads straight to the core design decision in section 6. Media (images, video) dwarfs text storage, so it belongs in object storage, not the database.

3. API design

A small REST surface covers the requirements. The home-timeline read is the hot path and must stay cheap.

POST /v1/tweets            { "text": "...", "mediaIds?": [...] }
POST /v1/follow            { "targetUserId": "..." }
POST /v1/tweets/{id}/like
GET  /v1/feed?cursor=...    -> page of timeline tweets (cursor-paginated)

Cursor-based pagination (rather than offset) keeps the feed read stable and efficient as new tweets arrive at the top.

4. Data model

Three core entities, plus likes. Tweet IDs come from a distributed, time-sortable generator (e.g., Snowflake) so they sort chronologically without a central counter.

EntityKey fieldsStore / partitioning
UseruserId, handle, displayName, createdAtRelational/KV, sharded by userId
TweettweetId, authorId, text, mediaRefs, createdAtWide-column/KV, sharded by tweetId
FollowfollowerId, followeeId, createdAtGraph/KV, indexed both directions
LiketweetId, userId, createdAtKV, sharded by tweetId; counters aggregated

The Follow table must answer two queries fast: "who does X follow?" (to build a timeline) and "who follows X?" (to fan out a tweet). Index both directions.

5. High-level architecture

Stateless services behind a gateway, with the write and read paths separated:

6. The core problem: timeline generation

This is where the interview is won or lost. How do we build a user's home timeline from the people they follow?

Fan-out on write (push). When a user tweets, the fan-out service immediately writes that tweet ID into every follower's timeline cache. Reads are then trivial — just return the precomputed list. The cost is write amplification: a user with millions of followers triggers millions of writes per tweet.

Fan-out on read (pull). Store each tweet once. At read time, the timeline service gathers recent tweets from everyone the user follows and merges them. Writes are cheap, but reads are expensive and slow for users who follow thousands of accounts.

Fan-out on write (push)Fan-out on read (pull)
Read costVery low (precomputed)High (merge at read time)
Write costHigh (one write per follower)Very low (single write)
Best forNormal users (read-heavy)Celebrities / inactive users
WeaknessCelebrity write stormSlow, expensive reads

The celebrity problem and the hybrid. Pure push breaks for accounts with tens of millions of followers — one tweet becomes a write storm. Pure pull punishes ordinary users with slow reads. The production answer is a hybrid: push tweets for normal accounts, but for a small set of celebrity accounts, skip fan-out and pull their tweets at read time, merging them into the user's precomputed timeline. So a home-timeline read = "precomputed list (push)" + "recent tweets from the handful of celebrities I follow (pull)," merged and sorted. Naming the celebrity problem and landing on this hybrid is the single highest-signal moment in the interview.

7. Caching and media

Caching is what makes the read path cheap at 100K+ reads/sec:

8. Bottlenecks and trade-offs

Framework reminder: Design Twitter follows the same arc as every system design answer — requirements → estimates → API → data model → high-level design → the core decision → trade-offs. It's a close cousin of designing a news feed, and the timeline pattern reappears whenever you need to control request volume, much like a rate limiter.

Practice Design Twitter with live AI support

CoPilot Interview surfaces a structured design skeleton — requirements, capacity, API, data model, and the fan-out trade-off — in about 4 seconds during real Zoom and Teams calls. Free for Windows and macOS. Learn more about the AI interview assistant.

Download free

FAQ

How do you generate the home timeline in Design Twitter?

By fanning out tweets. Fan-out on write precomputes each follower's timeline when a tweet is posted, making reads fast but writes expensive. Fan-out on read assembles the timeline at read time, making writes cheap but reads slow. Production Twitter uses a hybrid that pushes for normal users and pulls for celebrities.

What is the celebrity problem when designing Twitter?

With fan-out on write, a tweet from an account with tens of millions of followers triggers that many timeline writes - a write storm. The fix is to pull celebrity tweets at read time and merge them into the precomputed timeline, while still pushing normal users' tweets.

How read-heavy is Twitter?

Heavily read-heavy. Timeline reads outnumber tweet writes by roughly two orders of magnitude, so the design optimizes the read path - which is exactly why fan-out on write (precomputing timelines) is the default for non-celebrity accounts.

What data stores would you use to design Twitter?

A tweet store partitioned by tweet ID, a social graph for follower/followee edges, a per-user timeline cache of recent tweet IDs in Redis, and object storage behind a CDN for media. IDs are generated by a distributed, time-sortable scheme like Snowflake.

Is Twitter strongly consistent?

No - eventual consistency is acceptable. A tweet taking a few seconds to appear in followers' timelines lets fan-out run asynchronously through a queue, which is essential to handle the write load at scale.