System Design Fundamentals

Understanding a Notification Service

A simple guide to how services like email, SMS, and push notifications are built and delivered at scale.

1What is a Notification Service?

Imagine you order something online. You get an email confirmation, an SMS when it ships, and a push notification when it's delivered. Instead of each department (billing, shipping, etc.) sending these messages themselves, they all ask one central system—the Notification Service—to do it for them.

This service is a specialist in sending messages through different channels.

Common Channels:

Email (Order a pizza, get a receipt)
SMS (Your taxi has arrived)
Push (Someone liked your photo)

2Our Goals for the System

Before we build, let's decide what our service should do and how it should behave.

What It Must Do (Functional)

  • → Send messages to different channels (Email, SMS, Push).
  • → Send a message to one person or millions.
  • → Keep track of whether a message was sent or failed.

How It Must Perform (Non-Functional)

  • Always available: It shouldn't crash.
  • Fast: Send messages without long delays.
  • Reliable: Don't lose any messages.
  • Scalable: Can handle huge traffic spikes (like on Black Friday).

3A Simple Plan (Architecture)

To avoid getting overwhelmed, we won't try to send the notification the instant we receive a request. Instead, we'll use a "to-do list" approach. We quickly add the task to a list and then get back to the user, so they don't have to wait.

How it Works:

1
Another service (e.g., the shopping app) sends a request: "Tell User X their order shipped."
2
Our Notification Service quickly adds this job to a Message Queue (our to-do list).
3

Specialist "Worker" programs are constantly checking the to-do list. An "Email Worker" grabs email jobs, an "SMS Worker" grabs SMS jobs, etc.

4
The worker sends the message using a 3rd-Party Service (like a post office for email).
5
Finally, the worker marks the job as "done" in a database.

4The Key Parts Explained

The API Service (The Front Desk)

This is the public face of our service. It takes incoming requests from other services. Its only job is to validate the request and add it to the to-do list (Message Queue) as fast as possible.

The Message Queue (The To-Do List)

This is the most important part! It's a highly reliable queue. By putting jobs here, the API can handle huge bursts of traffic without slowing down. If the sending workers crash, the to-do list remembers all the jobs that still need to be done.

The Workers (The Specialists)

These are the workhorses. You have a team of workers for each channel (Email, SMS, Push). They pick up a job from the queue, connect to the right 3rd-party service, send the message, and then grab the next job.

5Storing Our Data

We need a database to keep track of what happened. We don't need a complex one. For each message, we just need to log:

  • What was the message?
  • Who was it for?
  • Which channel did it use (Email, SMS, etc.)?
  • What is its status (e.g., Pending, Sent, or Failed)?

6What if Things Go Wrong?

Problem: The 3rd party service is down.

Solution: Don't give up on the first try. The worker should wait a bit and then retry a few times. Often, the problem is temporary.

Problem: A message fails repeatedly.

Solution: After a few retries, if it still fails, move it to a special "failed jobs" list, called a Dead Letter Queue (DLQ). This stops it from blocking new messages, and an engineer can investigate later.

Problem: We might send the same message twice.

Solution: This is a big one! Imagine your worker sends an email but crashes before it can mark the job as "done". Another worker might pick up the same job and send it again. To prevent this, we give every job a unique ID. Before sending, a worker checks if a message with that ID has already been sent. This is called idempotency.

Key Takeaways

  • Use a "to-do list" (Message Queue) so your system is fast and can handle traffic spikes.
  • Use specialist Workers for each notification type so you can scale them separately.
  • Never lose a message. Use retries and a Dead Letter Queue for failures.
  • Never send twice. Design your system to be idempotent to avoid duplicate messages.

Ready to Practice?

Practice designing a Notification Service with an AI interviewer and get instant feedback.

Related System Design Questions: