Design a Chat App: System Design Interview Walkthrough

"Design a chat app" (WhatsApp, Messenger, Slack) is the canonical real-time system design question. The crux is maintaining millions of persistent connections and routing each message to the exact server holding the recipient's connection — plus presence, delivery receipts, and durable history.

Here's the full walkthrough with a diagram, covering the WebSocket gateway, connection routing, and delivery guarantees.

Chat app architecture: WebSocket gateway, chat service, presence service, message queue, message store

1. Clarify the requirements

Functional requirements

One-to-one and group messaging
Online/last-seen presence
Delivery and read receipts
Message history and sync across devices
Push notifications when the recipient is offline

Non-functional requirements

Real-time delivery (low latency)
Durable — messages are never lost
Scale to millions of concurrent connections
Ordered delivery within a conversation

Back-of-the-envelope scale: Assume 50M concurrent connections and billions of messages/day. The hard part isn't throughput — it's holding tens of millions of long-lived connections and finding the right one per message.

2. API design

Chat is connection-oriented, not request/response. Clients hold a WebSocket; messages flow as events over it.

# Persistent connection
WS /connect                      (upgraded, authenticated)

# Events over the socket
-> { type: "send", to: "userB", text: "hi" }
<- { type: "message", from: "userA", text: "hi", id, ts }
<- { type: "receipt", messageId, status: "delivered" | "read" }

3. High-level architecture

Clients open a WebSocket to a connection gateway. Because connections are long-lived and sticky, the system must know which gateway server currently holds each user's socket — that mapping lives in a fast store (Redis) updated on connect/disconnect.

When A sends a message, its gateway forwards it to the chat service, which persists it and looks up B's connection server. It delivers via that server if B is online; if B is offline, it enqueues a push notification. Cross-server delivery uses a pub/sub channel so any gateway can hand a message to any other.

Presence is tracked with heartbeats: clients ping periodically, and a Redis entry with a TTL marks them online; a missed heartbeat expires the entry. Read/delivery receipts are just small status events flowing back through the same path.

4. Data model & storage

Messages are stored partitioned by conversation ID and ordered by time, so loading a chat is a single efficient range scan. A wide-column store (Cassandra/HBase) fits this access pattern at scale.

Per-device sync tracks the last message each device has seen, so a phone and laptop both catch up correctly. Group messages fan out to each member's delivery path.

5. Scaling and bottlenecks

Many stateless gateway servers behind a load balancer, each holding a slice of connections; scale out by adding servers.
Connection registry in Redis (user → server) so messages route to the right gateway in one lookup.
Pub/sub backbone (Redis/Kafka) for cross-server message handoff.
Shard message storage by conversation ID; archive old messages to cheaper storage.

Key trade-offs the interviewer probes

WebSocket vs long-polling. WebSockets give true low-latency push and are the standard answer; long-polling is a fallback for restricted networks but wastes resources.
Delivery guarantees. Aim for at-least-once delivery with client-side de-duplication by message ID — exactly-once is impractical, so dedupe instead.
Group fan-out. Small groups can fan out per-member; very large groups may need a pull model so one message doesn't trigger a huge write burst — the same push/pull tension as the news feed.

Framework reminder: every system design answer follows the same arc — requirements → estimates → API → high-level design → data model → scale → trade-offs. Keep the system design cheat sheet in mind and narrate which stage you're in.

Handle real-time design with live AI support

CoPilot Interview surfaces a structured design skeleton — requirements, API, data model, and scaling — in about 4 seconds during real Zoom and Teams calls. Free for Windows and macOS, with a private desktop window.

Download free

FAQ

Why use WebSockets for a chat app?

WebSockets provide a persistent, bidirectional connection so the server can push messages to clients instantly without polling. That low-latency push is exactly what real-time chat needs; long-polling is only a fallback for networks that block WebSockets.

How do you route a message to the right server?

Maintain a connection registry (user to gateway server) in a fast store like Redis, updated on connect and disconnect. When a message arrives, look up the recipient's server and hand the message off, using a pub/sub backbone so any gateway can deliver to any other.

How do you handle offline users?

Persist every message durably first. If the recipient has no active connection, enqueue a push notification (APNs/FCM) and deliver the message when they reconnect and sync. Persistence-before-delivery is what guarantees no message is lost.

How is presence (online status) implemented?

With heartbeats: clients send a periodic ping, and the server stores an online marker with a TTL in Redis. A missed heartbeat lets the entry expire, flipping the user to offline. It's cheap and self-healing.

How do you guarantee messages aren't lost or duplicated?

Persist messages before attempting delivery, and use at-least-once delivery with client-side de-duplication by message ID. Exactly-once delivery is impractical in distributed systems, so deduplicating on a unique ID is the standard approach.

Design a Chat App

1. Clarify the requirements

2. API design

3. High-level architecture

4. Data model & storage

5. Scaling and bottlenecks

Key trade-offs the interviewer probes

Handle real-time design with live AI support

FAQ

Related guides