An original guide to engineering a service that protects your APIs from overuse and ensures stability. Learn the core algorithms and distributed architecture from the ground up.
Let's start by understanding the purpose of a rate limiter and defining our goals.
A rate limiter is like a bouncer at a club. It doesn't inspect who you are (that's authentication), but rather how often you're trying to get in. Its job is to control the flow of traffic to protect the services behind it.
The most common and effective place to put a rate limiter is at the edge of your system, typically within an API Gateway.
By implementing the rate limiter in the gateway, we create a single choke point for all incoming traffic. This protects every downstream service without them needing to know about rate limiting at all. The flow is simple:
1. Request arrives at the API Gateway.
2. Rate Limiter middleware checks if the request is allowed.
3. If YES: Forward request to the intended service.
4. If NO: Immediately reject with a `429` error.
The "brain" of the rate limiter is the algorithm it uses to track and limit requests. Let's explore the most common ones.
Imagine a bucket that holds a set number of tokens. To make a request, you must grab a token. The bucket is refilled at a constant rate. If there are no tokens, you must wait.
✓ Pro: Great for handling bursts of traffic, as long as tokens are available.
✗ Con: Can be complex to tune the bucket size and refill rate.
Imagine a bucket with a small hole. Requests are "poured" into the bucket. The bucket processes requests out of the hole at a fixed, constant rate. If the bucket is full when a new request arrives, it is rejected.
✓ Pro: Smoothes out traffic into a steady stream, which is easy for downstream services to handle.
✗ Con: Bursts of requests are flattened, which might not be ideal for all use cases.
The simplest method. We count requests from a user in a fixed time window (e.g., 1 minute). If the count exceeds the limit, we reject further requests until the window resets.
✓ Pro: Very easy to implement and memory-efficient.
✗ Con: A burst of traffic at the edge of a window (e.g., 1:59 and 2:01) can allow double the rate.
A hybrid approach that fixes the flaw in the Fixed Window. It smooths the rate by considering a weighted count from the previous window in addition to the current window's count.
✓ Pro: Good balance of performance and accuracy. Prevents the "edge" problem.
✗ Con: More complex to implement than a fixed window.
Recommended Choice: A Sliding Window Counter provides a great mix of performance, memory efficiency, and accuracy for most modern applications.
A single rate limiter is a single point of failure. In a real system, we have multiple servers. How do they share rate limit data?
The solution is to use a centralized, high-speed data store like Redis. Every server in our API gateway talks to this central Redis cluster to share and update request counts.
`user_id_123: { count: 98, window_start: 1678886400 }`
With multiple servers reading and writing simultaneously, we can get a race condition (e.g., two servers read a count of 99, both allow the request, and both set the count to 100). To prevent this, we must use atomic operations. Redis provides commands like `INCR` which are atomic, meaning they are guaranteed to complete as a single, uninterruptible operation.
Practice designing a Rate Limiter with an AI interviewer and get instant feedback.