Design a Distributed Cache: System Design Walkthrough

"Design a distributed cache" — often phrased as "design Redis" or "design Memcached" — is a favorite because it forces you to reason about data partitioning, replication, and consistency in one compact problem. It's really the question "how do you keep hot data in memory across many machines without one node becoming a bottleneck?" Below is a structured walkthrough you can mirror in a real system design interview, narrating each stage as you go. Keep the system design interview cheat sheet open alongside for the shared building blocks.

1. Clarify the requirements

Don't start drawing boxes. Scope the problem first so you optimize the right thing.

Functional requirements

get(key) returns a value or a miss.
set(key, value, ttl?) stores a value with an optional time-to-live.
delete(key) removes an entry.
The cache is bounded in memory, so it must evict entries when full.

Non-functional requirements

Low latency: sub-millisecond reads and writes; this is the whole reason a cache exists.
High availability: a node failure should degrade to a few extra database reads, not an outage.
Scalability: add nodes to grow capacity and throughput horizontally.
Tunable consistency: some staleness is usually acceptable in exchange for speed.

I'll assume we're building a look-aside cache in front of a database, storing small opaque values (a few KB), and that we can tolerate brief staleness. Calling those assumptions out shows judgment without burning time.

2. The cache API and access pattern

The client-facing surface is deliberately tiny — that simplicity is what makes a cache fast.

get(key)              -> value | MISS
set(key, value, ttl)  -> OK
delete(key)           -> OK

The common pattern is cache-aside (look-aside): the application checks the cache, and on a miss reads the database, populates the cache, then returns. The cache doesn't know about the database — the application orchestrates it. This keeps the cache generic and lets it fail open: if the cache is down, requests simply fall through to the database.

3. Eviction policies

Because memory is bounded, when the cache fills up something has to go. The eviction policy is a favorite follow-up.

Policy	Evicts	Best for
LRU (Least Recently Used)	The entry untouched for the longest	General default; recency predicts reuse
LFU (Least Frequently Used)	The least-accessed entry	Stable popularity distributions
TTL (Time To Live)	Entries past a fixed lifetime	Bounding staleness; often paired with LRU

LRU is the usual answer. A clean implementation is a hash map plus a doubly linked list: the map gives O(1) lookup, and the list tracks recency — on every access you move the node to the head, and you evict from the tail. Mention that real systems combine LRU with TTL so entries also expire on time even if they stay warm.

4. Sharding with consistent hashing

One machine can't hold everything, so we spread keys across a cluster. The naive approach — node = hash(key) % N — works until N changes. Add or remove a single node and almost every key remaps to a different node, causing a cluster-wide cache-miss storm that hammers the database. This is the pitfall to name and avoid.

Consistent hashing fixes it. Map both nodes and keys onto a hash ring (say 0 to 2³²). A key is owned by the first node encountered clockwise from the key's position. When a node joins or leaves, only the keys in its immediate arc move; the rest stay put. To keep load even, give each physical node many virtual nodes (replicas at multiple ring positions), so removing one machine spreads its keys across the remaining nodes instead of dumping them all on its single successor.

            Consistent hash ring (with virtual nodes)

                         [key K]
                            |  clockwise
                            v
        A1 ----------- B2 ----------- C1
       /                                \
      C2                                 A2
      |         hash(K) lands here        |
      B1                                 B3
       \                                /
        A3 ----------- C3 ----------- B4

   K is owned by the next node clockwise (B2).
   Removing node B only remaps B's arcs -> A and C absorb them.
   Virtual nodes (A1..A3, B1..B4, C1..C3) balance the load.

Consistent hashing keeps remapping local when nodes join or leave.

5. High-level architecture

The pieces fit together like this:

Client library / routing layer — holds the ring, hashes the key, and routes the request straight to the owning node (client-side routing avoids a proxy hop).
Cache nodes — each stores an in-memory hash map plus the LRU structure for its slice of the keyspace.
Membership / coordination — a lightweight service (or gossip protocol) tracks which nodes are alive and propagates ring changes.
Backing store — the database that the cache sits in front of, hit only on misses.

   Client ---> Routing layer (consistent hash ring)
                     |
        +------------+------------+
        v            v            v
    [Node A]     [Node B]     [Node C]      each: hashmap + LRU + replica
        |            |            |
        +------ replication ------+
                     |
                     v
              [ Database ]   <- hit only on cache miss

Cache-aside cluster: routing by consistent hashing, replicated nodes, database behind.

6. Replication and write policies

Two decisions determine the cache's durability and consistency behavior.

Replication. To survive a node dying, replicate each key to one or more successor nodes on the ring (a primary plus N replicas). Reads can be served from a replica for extra throughput; on primary failure, a replica is promoted. This trades memory for availability — state the replication factor explicitly.

Write policies govern how the cache and database stay in sync:

Policy	How it works	Trade-off
Write-through	Write to cache and DB synchronously	Always fresh; slower writes
Write-back	Write to cache; flush to DB asynchronously	Fast; risks data loss on failure
Write-around	Write to DB only; skip the cache	Avoids cache pollution; first read is a miss

Pair these with the read pattern: write-around plus cache-aside is a common, safe default for read-heavy workloads, while write-back suits write-heavy paths that can tolerate some loss.

7. Hot keys and cache invalidation

Two problems reliably come up once the basics are covered.

Hot keys. A single wildly popular key (a trending item, a viral post) can overwhelm the one node that owns it, defeating the point of sharding. Detect them with per-key request counters, then mitigate: replicate the hot key to several nodes, put a small local (client-side) cache in front of the distributed cache for the hottest entries, or append a random suffix so copies scatter across shards. This is closely related to controlling request volume with a rate limiter in front of the origin.

Cache invalidation. Famously one of the hard problems in computer science. When the underlying data changes, the cached copy is now wrong. Options: short TTLs so stale data self-heals; explicit invalidation (delete the key on write); or write-through so the cache updates in lockstep. Also guard against the thundering herd — when a hot key expires, many requests miss at once and stampede the database. Fix it with a per-key lock (only one request recomputes) or by serving slightly stale data while a single background refresh runs.

8. Consistency vs availability and trade-offs

Consistency vs availability. Most caches favor availability and latency, accepting brief staleness bounded by TTL or invalidation. If you truly need strong consistency, use write-through with synchronous invalidation — and pay in write latency.
Consistent hashing is non-negotiable. Always reach for it over hash-modulo-N, and explain the miss-storm it prevents plus the role of virtual nodes.
Fail open. A cache outage should degrade to database reads, never a hard error — design the client to treat cache errors as misses.
Memory management. Bounded memory plus eviction is the core constraint; pick LRU/LFU/TTL deliberately based on the access pattern.

Framework reminder: Designing a distributed cache follows the same arc as every system design answer — requirements → API → core mechanics (eviction) → the partitioning decision (consistent hashing) → replication & writes → trade-offs. The partitioning and hot-key patterns reappear everywhere, from designing Ticketmaster to designing a payment system.

Practice distributed systems with live AI support

CoPilot Interview surfaces a structured design skeleton — requirements, API, eviction, consistent hashing, and the consistency trade-off — in about 4 seconds during real Zoom and Teams calls. Free tier for Windows and macOS. Explore the free AI interview assistant.

Download free

FAQ

Why use consistent hashing in a distributed cache?

Consistent hashing maps both cache keys and nodes onto a hash ring so that adding or removing a node only remaps the keys near that node, not the entire keyspace. With a plain hash-modulo-N scheme, changing the node count reshuffles almost every key and causes a mass cache miss storm. Virtual nodes spread each physical node across many ring positions to keep the load balanced.

What is the difference between write-through, write-back, and write-around caching?

Write-through writes to the cache and the database synchronously, so the cache is always fresh but writes are slower. Write-back writes to the cache first and flushes to the database asynchronously, which is fast but risks data loss on node failure. Write-around writes only to the database and skips the cache, which avoids polluting the cache with data that may never be read.

How do you handle hot keys in a distributed cache?

A hot key is a single key that receives so much traffic it overloads one node. Fixes include replicating the hot key across multiple nodes, adding a small local (client-side) cache in front of the distributed cache, or appending a random suffix to spread copies across shards. Detecting hot keys with per-key request counters is the first step.

How does a cache decide what to evict?

Because cache memory is bounded, an eviction policy decides which entry to drop when the cache is full. LRU (Least Recently Used) evicts the entry untouched for the longest and is the common default. LFU (Least Frequently Used) evicts the least-accessed entry, better for stable popularity. TTL expires entries after a fixed lifetime regardless of access, and is often combined with LRU.

Is a distributed cache strongly consistent?

Usually not. Most distributed caches favor availability and low latency over strong consistency, accepting that a cached value may be briefly stale after the underlying database changes. Invalidation or short TTLs bound the staleness. If strong consistency is required, write-through with synchronous invalidation is used, at the cost of higher write latency.

Design a Distributed Cache — System Design Walkthrough

1. Clarify the requirements

2. The cache API and access pattern

3. Eviction policies

4. Sharding with consistent hashing

5. High-level architecture

6. Replication and write policies

7. Hot keys and cache invalidation

8. Consistency vs availability and trade-offs

Practice distributed systems with live AI support

FAQ

Related guides