"Design Netflix" is one of the most common video streaming system design questions, and it trips people up because they reach for a database when the real answer is a CDN. Netflix is overwhelmingly read-heavy: the catalog changes slowly, but each title is streamed billions of times. The winning insight is to do all the expensive work once — transcode every title ahead of time and push the segments to edge servers — so that playback becomes a cheap static-file fetch.
Here's the full walkthrough with a diagram, covering the transcoding pipeline, adaptive bitrate streaming over a CDN, metadata, recommendations, and the exact sequence of the play path.
UPLOAD / INGEST PATH PLAYBACK PATH
+----------------+ +------------------+
| Content / | | Client player |
| Studio upload | | (TV, phone, web) |
+-------+--------+ +---------+--------+
| | 1. GET manifest
v v
+----------------+ +------------------+
| Chunk splitter| | Playback API / |
+-------+--------+ | metadata service|
| fan out +---------+--------+
v | 2. authorize + URLs
+----------+----------+ v
| Transcoding workers | +------------------+
| (parallel encode: | | Recommendation |
| 1080p/720p/480p, | | / catalog DB |
| H.264/AV1, HLS/DASH| +------------------+
+----------+----------+ |
| segments + manifests | 3. fetch segments
v v
+----------------+ push +----------------------------------+
| Origin storage |=========> | CDN edge caches |
| (S3 / object) | | (segments cached near viewers) |
+----------------+ +----------------------------------+
1. Clarify the requirements
Functional requirements
- Upload/ingest a title and process it for streaming
- Browse and search the catalog
- Stream video smoothly across devices and network conditions
- Personalized recommendations and "continue watching"
- Track watch history and playback position
Non-functional requirements
- Low startup latency and no rebuffering during playback
- Highly available and globally distributed
- Scale to hundreds of millions of viewers
- Durable storage of master files (never re-shot)
Back-of-the-envelope scale: Assume ~250M subscribers, tens of millions concurrent streams at peak, and an average bitrate of ~5 Mbps. That's on the order of hundreds of terabits per second of egress — a volume no central origin can serve, which is precisely why the CDN is the heart of the design.
2. API design
The read path is a small, cacheable set of calls. The heavy bytes never touch the app tier — the API just hands back URLs into the CDN.
# Browse / search
GET /v1/catalog?row=trending&profile=123
GET /v1/titles/{titleId}
# Start playback -> returns a manifest of renditions + CDN URLs
POST /v1/playback/{titleId} -> { manifestUrl, drmToken, cdnBase }
# Report progress (for "continue watching")
POST /v1/watch/{titleId} { positionSec, deviceId }
3. The upload and transcoding pipeline
When a title is ingested, the master file is processed offline, once. This is the batch half of the system and it must be parallel to be fast: a two-hour movie encoded serially into a dozen renditions would take far too long.
- Chunk the master. Split the source into short chunks so they can be encoded independently.
- Fan out to workers. A fleet of transcoding workers encodes each chunk in parallel into every target resolution (1080p, 720p, 480p…) and codec (H.264, HEVC, AV1).
- Package into segments. Outputs are cut into short HLS/DASH segments (2–10s) and a manifest is generated that lists every rendition and its segment URLs.
- Validate and publish. Segments are checked, written to origin storage, and pushed out to the CDN so edges are warm before the title goes live.
The key idea: pay the encoding cost once at ingest so every one of the billions of subsequent views is just a static fetch.
4. CDN and adaptive bitrate streaming
Serving video from a central origin to a global audience is impossible at Netflix scale — latency and egress both explode. Instead, transcoded segments are cached on CDN edge servers physically close to viewers (Netflix runs its own Open Connect appliances inside ISPs). The origin is hit only on a cache miss, so it stays quiet.
Adaptive bitrate streaming (ABR) is what keeps playback smooth. Because each title exists at several bitrates, the client player measures its available bandwidth and, for each segment, requests the highest quality it can sustain. When the network degrades mid-stream, the very next segment drops to a lower rendition — the viewer sees a brief quality dip instead of a stall. HLS and DASH are the two standard protocols that implement this segment-plus-manifest model.
The one-liner interviewers want: transcode once into multiple bitrates, chop into short segments, cache them on a CDN near users, and let the client pick the best rendition per segment. That single sentence is 80% of the Netflix answer.
5. The play path (step by step)
When a viewer hits play, here's the sequence — note how little of it involves moving video through your servers:
- Client calls the playback API, which authorizes the subscription and resolves the title.
- The API returns a manifest listing the available bitrate renditions plus a DRM token and CDN base URLs.
- The player picks a starting rendition and fetches segments directly from the nearest CDN edge.
- As bandwidth changes, the player adapts the rendition per segment; it periodically reports position so "continue watching" works across devices.
6. Metadata, recommendations & storage
Metadata service. Titles, descriptions, artwork, cast, and availability are read constantly on the browse screen, so they live in a database tuned for fast reads and sit behind a cache. This is separate from the video bytes entirely.
Recommendations. Ranking models are computed offline from watch history and title features, and the results (the ranked rows on your home screen) are precomputed and cached per profile. The serving path just reads a prepared list — you do not run a model on the request path.
Storage tiers. Master files and transcoded segments are the origin of truth in durable object storage (S3-style). Hot content is cached at the edge; cold content lives only at origin. Watch history and playback position go in their own store optimized for high write throughput.
Key trade-offs the interviewer probes
- Pre-transcode vs transcode on the fly. Pre-transcoding costs storage (many renditions per title) but makes playback a cheap cache hit. On-the-fly encoding saves storage but adds latency and CPU on the hot path — the wrong trade for a read-heavy catalog.
- Push vs pull to the CDN. Popular new releases are pushed to edges proactively so the launch spike is a cache hit; long-tail titles are pulled on first request. Pre-warming avoids a thundering-herd miss on origin.
- Startup latency vs quality. Starting at a low rendition and ramping up gets playback going fast; starting high risks an initial buffer. ABR resolves this by adapting after the first segment.
- More renditions vs storage cost. Each extra resolution/codec improves the match to device and network but multiplies storage and encoding work.
Framework reminder: every system design answer follows the same arc — requirements → estimates → API → high-level design → data model → scale → trade-offs. Keep the system design cheat sheet in mind and narrate which stage you're in.
Handle streaming design with live AI support
CoPilot Interview surfaces a structured design skeleton — requirements, API, data model, and scaling — in about 4 seconds during real Zoom and Teams calls. Free for Windows and macOS, invisible on screen-share.
Download freeFAQ
Why is Netflix mostly a read-heavy CDN problem, not a database problem?
The catalog changes slowly but is streamed billions of times, so the dominant load is serving huge, immutable video files to viewers. The winning move is to pre-transcode every title once and push the resulting segments to a CDN close to users, turning playback into a cheap static-file fetch rather than a live computation.
What is adaptive bitrate streaming (ABR)?
Each title is encoded at several resolutions and bitrates and chopped into short segments (2 to 10 seconds). The client player measures available bandwidth and, for each segment, requests the highest quality it can sustain, stepping down instantly when the network degrades. HLS and DASH are the two standard protocols that implement this.
How does the transcoding pipeline work?
An uploaded master file is split into chunks and fanned out to a fleet of workers that encode each chunk in parallel into every target resolution and codec. The outputs are packaged into HLS/DASH segments with manifest files, validated, and pushed to origin storage and the CDN. Parallel chunked encoding is what makes transcoding a full movie fast.
How does a user actually start playing a video?
The client calls the playback API, which authorizes the user, resolves the title, and returns a manifest listing the available bitrate renditions plus CDN URLs for the segments. The player fetches segments directly from the nearest CDN edge and adapts quality per segment. Metadata and playback are served by the app tier; the bytes come from the CDN.
How does Netflix store and serve so much video?
Master files and transcoded segments live in durable object storage (S3-style) as the origin of truth. Popular content is cached at CDN edge locations near viewers, so the origin is hit rarely. Metadata like titles, artwork, and watch history lives in separate databases tuned for fast reads.