A detailed, end-to-end explanation of how a large-scale messaging system like WhatsApp works — from sending a message to delivery, storage, encryption, and global scaling.
This section gives a quick end-to-end view of how a WhatsApp message travels through the system.
Key idea: Messages are stored first, delivered fast, and never permanently stored after successful delivery.
Before diving into the solution, we need to understand what a real-time chat system must accomplish and the constraints it operates under.
A real-time chat system enables instant message delivery between users across devices and networks. The system must handle one-to-one conversations, group chats, offline message delivery, and maintain message ordering while scaling to billions of daily messages.
100B / 86,400s × 2 (peak factor) ≈ 2.3 million msg/sec
100B × 100 bytes ≈ 10 TB/day
500M DAU × 20% online ≈ 100 million connections
Key Insight: The system is write-heavy with massive concurrent connections. WebSocket efficiency and horizontal scaling are critical design considerations.
The system consists of several interconnected services that handle different responsibilities.
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Mobile │────▶│ Load │────▶│ WebSocket │
│ Client │◀────│ Balancer │◀────│ Gateway │
└─────────────┘ └──────────────┘ └────────┬────────┘
│
┌──────────────────────────────┼──────────────┐
│ │ │
┌─────▼─────┐ ┌─────────▼────────┐ │
│ Message │ │ Presence │ │
│ Service │ │ Service │ │
└─────┬─────┘ └──────────────────┘ │
│ │
┌──────────┼──────────┐ ┌────────────▼───┐
│ │ │ │ Group │
┌────▼────┐ ┌───▼───┐ ┌────▼────┐ │ Service │
│ Message │ │ Redis │ │ Push │ └────────────────┘
│ DB │ │ Cache │ │ Service │
└─────────┘ └───────┘ └─────────┘Maintains persistent connections with clients, authenticates users, routes messages between users and backend services, and manages connection heartbeats.
Handles message processing, validation, storage, and delivery logic. Ensures messages are persisted before acknowledgment.
Tracks which users are online, manages last-seen timestamps, and propagates status changes to relevant contacts.
Manages group membership, permissions, and message fan-out to all group participants.
Real-time messaging requires persistent connections between clients and servers. HTTP polling wastes resources, so WebSocket connections provide efficient bidirectional communication.
The system must track which server holds each user's connection. A distributed cache like Redis stores this mapping:
User ID → {
server_id: "gateway-server-42",
connection_id: "conn-abc123",
last_heartbeat: "2024-01-15T10:30:00Z"
}When User A sends a message to User B:
1. Look up User B's connection location in Redis
2. Route the message to that specific gateway server
3. The gateway pushes the message through User B's WebSocket
4. User B's client acknowledges receipt
Connections drop due to network issues, app switches, or device sleep. The system handles this through:
Clients send periodic pings; missing heartbeats trigger cleanup after 30 seconds.
Clients automatically reconnect with exponential backoff and resume from last message.
Undelivered messages queue until the recipient reconnects or goes offline.
Message {
message_id: UUID // Globally unique identifier
conversation_id: UUID // Groups messages in a chat
sender_id: UUID // Who sent the message
content: encrypted_bytes // Encrypted message payload
content_type: enum // text, image, video, audio
timestamp: datetime // Server-assigned timestamp
status: enum // sent, delivered, read
}Cassandra or ScyllaDB
Apache Kafka
When User A sends a message to User B:
1. Client sends encrypted message over WebSocket
2. Gateway authenticates and forwards to Message Service
3. Message Service validates content and rate limits
4. Message writes to primary database
5. Acknowledgment sent to User A (single checkmark)
6. System checks User B's connection status
7. If online: push through WebSocket gateway
8. If offline: queue for push notification
9. Delivery confirmation updates status (double checkmark)
10. Read receipt notifies User A when viewed
Users frequently go offline. The system must reliably deliver messages when they return.
1. Message Service detects recipient is offline
2. Push Service receives delivery request
3. Platform-specific push sent (APNs/FCM)
4. Push contains minimal data (sender, preview)
5. Full message syncs when app opens
1. Client reports last received message ID
2. Server queries all messages after that ID
3. Messages batch and stream to client
4. Client acknowledges receipt
5. Delivery status propagates to senders
Group chats introduce fan-out complexity where one message reaches many recipients.
When a message arrives, immediately create copies for each recipient.
✓ Fast reads, simple recipient logic
✗ High write amplification, storage overhead
Best for: Small groups (under 100 members)
Store message once, recipients query their groups.
✓ Efficient storage, single write
✗ Slower reads, complex query logic
Best for: Large groups or broadcast channels
Recommended Approach: Use a hybrid strategy - fan-out on write for small groups (faster UX) and fan-out on read for large groups (efficient storage). WhatsApp uses this pattern.
Message security requires encryption that prevents even the server from reading content.
1. Key Generation: Each device generates a public-private key pair
2. Key Exchange: Users exchange public keys when starting conversations
3. Session Establishment: Devices negotiate a shared session key using Double Ratchet
4. Message Encryption: Each message encrypts with a unique message key
5. Forward Secrecy: Compromised keys cannot decrypt past messages
The server stores only public keys - it never has access to private keys or message content:
DeviceKey {
user_id: UUID
device_id: UUID
identity_key: public_key // Long-term identity
signed_prekey: public_key // Medium-term, rotated
one_time_prekeys: [public_key] // Single-use keys
}Each component scales independently based on its specific bottleneck:
Global users require regional infrastructure:
Shard messages by conversation ID for optimal query performance:
Shard = hash(conversation_id) % number_of_shards Benefits: • Messages in one conversation reside together • Queries for conversation history hit single shard • Load distributes evenly across shards
Practice designing a real-time chat system with an AI interviewer and get instant feedback.