A simple guide to how services like email, SMS, and push notifications are built and delivered at scale.
Imagine you order something online. You get an email confirmation, an SMS when it ships, and a push notification when it's delivered. Instead of each department (billing, shipping, etc.) sending these messages themselves, they all ask one central system—the Notification Service—to do it for them.
This service is a specialist in sending messages through different channels.
Before we build, let's decide what our service should do and how it should behave.
To avoid getting overwhelmed, we won't try to send the notification the instant we receive a request. Instead, we'll use a "to-do list" approach. We quickly add the task to a list and then get back to the user, so they don't have to wait.
Specialist "Worker" programs are constantly checking the to-do list. An "Email Worker" grabs email jobs, an "SMS Worker" grabs SMS jobs, etc.
This is the public face of our service. It takes incoming requests from other services. Its only job is to validate the request and add it to the to-do list (Message Queue) as fast as possible.
This is the most important part! It's a highly reliable queue. By putting jobs here, the API can handle huge bursts of traffic without slowing down. If the sending workers crash, the to-do list remembers all the jobs that still need to be done.
These are the workhorses. You have a team of workers for each channel (Email, SMS, Push). They pick up a job from the queue, connect to the right 3rd-party service, send the message, and then grab the next job.
We need a database to keep track of what happened. We don't need a complex one. For each message, we just need to log:
Solution: Don't give up on the first try. The worker should wait a bit and then retry a few times. Often, the problem is temporary.
Solution: After a few retries, if it still fails, move it to a special "failed jobs" list, called a Dead Letter Queue (DLQ). This stops it from blocking new messages, and an engineer can investigate later.
Solution: This is a big one! Imagine your worker sends an email but crashes before it can mark the job as "done". Another worker might pick up the same job and send it again. To prevent this, we give every job a unique ID. Before sending, a worker checks if a message with that ID has already been sent. This is called idempotency.
Practice designing a Notification Service with an AI interviewer and get instant feedback.