HomeBlog › Design a Chat App

Design a Chat App

WhatsApp/Messenger-style real-time chat: WebSockets, routing messages to the right connection, presence, delivery guarantees, and history.

"Design a chat app" (WhatsApp, Messenger, Slack) is the canonical real-time system design question. The crux is maintaining millions of persistent connections and routing each message to the exact server holding the recipient's connection — plus presence, delivery receipts, and durable history.

Here's the full walkthrough with a diagram, covering the WebSocket gateway, connection routing, and delivery guarantees.

Chat app architecture: WebSocket gateway, chat service, presence service, message queue, message store
Chat app architecture: WebSocket gateway, chat service, presence service, message queue, message store

1. Clarify the requirements

Functional requirements

Non-functional requirements

Back-of-the-envelope scale: Assume 50M concurrent connections and billions of messages/day. The hard part isn't throughput — it's holding tens of millions of long-lived connections and finding the right one per message.

2. API design

Chat is connection-oriented, not request/response. Clients hold a WebSocket; messages flow as events over it.

# Persistent connection
WS /connect                      (upgraded, authenticated)

# Events over the socket
-> { type: "send", to: "userB", text: "hi" }
<- { type: "message", from: "userA", text: "hi", id, ts }
<- { type: "receipt", messageId, status: "delivered" | "read" }

3. High-level architecture

Clients open a WebSocket to a connection gateway. Because connections are long-lived and sticky, the system must know which gateway server currently holds each user's socket — that mapping lives in a fast store (Redis) updated on connect/disconnect.

When A sends a message, its gateway forwards it to the chat service, which persists it and looks up B's connection server. It delivers via that server if B is online; if B is offline, it enqueues a push notification. Cross-server delivery uses a pub/sub channel so any gateway can hand a message to any other.

Presence is tracked with heartbeats: clients ping periodically, and a Redis entry with a TTL marks them online; a missed heartbeat expires the entry. Read/delivery receipts are just small status events flowing back through the same path.

4. Data model & storage

Messages are stored partitioned by conversation ID and ordered by time, so loading a chat is a single efficient range scan. A wide-column store (Cassandra/HBase) fits this access pattern at scale.

Per-device sync tracks the last message each device has seen, so a phone and laptop both catch up correctly. Group messages fan out to each member's delivery path.

5. Scaling and bottlenecks

Key trade-offs the interviewer probes

Framework reminder: every system design answer follows the same arc — requirements → estimates → API → high-level design → data model → scale → trade-offs. Keep the system design cheat sheet in mind and narrate which stage you're in.

Handle real-time design with live AI support

CoPilot Interview surfaces a structured design skeleton — requirements, API, data model, and scaling — in about 4 seconds during real Zoom and Teams calls. Free for Windows and macOS, invisible on screen-share.

Download free

FAQ

Why use WebSockets for a chat app?

WebSockets provide a persistent, bidirectional connection so the server can push messages to clients instantly without polling. That low-latency push is exactly what real-time chat needs; long-polling is only a fallback for networks that block WebSockets.

How do you route a message to the right server?

Maintain a connection registry (user to gateway server) in a fast store like Redis, updated on connect and disconnect. When a message arrives, look up the recipient's server and hand the message off, using a pub/sub backbone so any gateway can deliver to any other.

How do you handle offline users?

Persist every message durably first. If the recipient has no active connection, enqueue a push notification (APNs/FCM) and deliver the message when they reconnect and sync. Persistence-before-delivery is what guarantees no message is lost.

How is presence (online status) implemented?

With heartbeats: clients send a periodic ping, and the server stores an online marker with a TTL in Redis. A missed heartbeat lets the entry expire, flipping the user to offline. It's cheap and self-healing.

How do you guarantee messages aren't lost or duplicated?

Persist messages before attempting delivery, and use at-least-once delivery with client-side de-duplication by message ID. Exactly-once delivery is impractical in distributed systems, so deduplicating on a unique ID is the standard approach.