By the way, ChatGPT suggested the title: The Art of Balancing Control and Accessibility
Background
Houston Airport had this really big problem. Passengers complained about the time it took for luggage to arrive at the terminal building after the airplane had landed. The Airport invested millions to solve this pain point. They improved the process, hired more people, and introduced new technology. They eventually succeeded in reducing the wait time to 7 minutes. However, users still complained. The Airport realized that they had reached a point where optimizing the process/design was no longer optimal. So they did something different. They reframed the problem. By reframing the problem, they discovered that it was not the time it took to get the luggage to the terminal building that was the problem. It was the time the passengers had to wait for the luggage that was the problem. The Airport decided to park the airplanes further away from the terminal building. Consequently, it took some time for passengers to arrive at the terminal building, thus reducing the wait time for luggage, and voila! Complaints dropped drastically.
One lesson that could be learned from this story is that speed could have unintended consequences, especially when granted to the wrong client or in the wrong context. Therefore, it makes sense to control which and how much traffic accesses our resources. Whenever such control is lacking or ineffective, developer productivity suffers, as engineers spend more time responding to platform incidences (PIs). Existing controls could benefit from an additional dimension as long as such a dimension does not introduce unnecessary complexity or increase response time noticeably. This article will explore rate-limiting as an additional dimension of controlling access to our resources.
What Is Rate Limiting?
Rate limiting is a mechanism used to control consumption over time. This consumption over time is known as the rate. Hence, the term rate limiting. The goal of a rate-limiting system is to work well when the system is under heavy load. It needs to be built for the worst 1%, not the good 99%.
More Than Limiting
Rate limiting is more than limiting. It could also be used to shape traffic in various ways. For example, smoothing of bursts in traffic. We increase the resiliency of the system by smoothing bursts in traffic. See diagram below:
Why Limit Rates?
It is easy to design a system that is 95% resilient. However, moving the resiliency dial to 99.99% requires a well-architected system. This is where rare-limiting, amongst other resiliency mechanisms, comes in. These mechanisms are gaining traction for the following reasons:
- Growth often happens during periods of high load.
- It is increasingly easier to exploit public resources due to the advent of AI and related tools.
With rate-limiting, we could achieve the following:
- Traffic shaping
- Prevent attacks — e.g., DDoS/brute force attacks.
- Prevent resource starvation — Some unusual traffic is caused by bots, errors in software, or configurations in some other part of the system, not malicious attacks.
- Improve developer productivity
- Save cost.
Common Rate-Limiting Algorithms (Optional Section)
Some common rate-limiting algorithms are:
- Token bucket
- Leaky bucket
- Fixed window counter
- Sliding window counter
- Sliding window logs
Token Bucket
A token bucket is a container that has a pre-defined capacity. Tokens are put in the bucket at preset rates periodically. Once the bucket is full, no more tokens are added. When a request arrives, we check if there is at least one token left in the bucket. If there is, we take one token out of the bucket, and the request goes through. If the bucket is empty, the request is dropped.
Pros
- Memory efficient.
- Accommodates burst/spike in traffic.
- Easy to implement.
Cons
- Needs to be adapted for the distributed system by achieving some atomicity when accessing the shared state of buckets.
Leaky Bucket
A leaky bucket is a container that has a pre-defined capacity. Tokens are put in the bucket, one for each request from the client. Requests are taken out of the bucket and processed at a constant rate. If the rate at which requests arrive is greater than the rate at which requests are processed, the bucket will fill up, and further requests will be dropped until there is space in the bucket.
Pros
- Memory efficient.
- Suitable for use cases where a stable outflow rate is required.
Cons
- Not Accommodating of burst/spike in traffic, as some recent requests may be dropped in such cases.
- Needs to be adapted for a distributed system by achieving some atomicity when accessing the shared state of buckets.
Rate Limiting at Scale
Desired
Here are some requirements for rate limiting at scale. Rate limiting should:
- Be very easy to set up.
- Support dynamic rate limiting on the fly. For example, conditional
rate limiting based on both server states (e.g.,
jvm memory
) and request details (e.gip address, user agent
). - Does not decrease response times.
- Support distributed systems.
- Be easy to maintain and evolve.
Needs Rate Limiting
Public facing pages, for example:
- Contact us and other such forms.
- Pages where users provide inputs that may need to be processed.
Background services which may suffer from traffic bursts:
- Image upload.
- Catalog upload.
Heavy lifting services:
- Order archive download.
- Other file download services.
Other public-facing pages, for example:
- Home page.
- Search page.
- Article details page.
Rate-Limiting Libraries
The following rate-limiting libraries can do the heavy lifting of rate-limiting.
guava | bucket4j | resilience4j | flex | remarks | |
---|---|---|---|---|---|
Easy to setup | |||||
Dynamic rate limiting on the fly | |||||
Decrease to response times | |||||
Supports distributed systems | |||||
Easy to maintain and evolve |
From the above list of rate limiters, only the Flex rate limiter allows developers to express various conditions for rate limiting. Fluent expression of such conditions would look like this:
* *
IF jvm.memory.available < 5G AND user.role = guest THEN 20 requests / second
ELSE IF jvm.memory.available < 3G THEN 10 requests / second
ELSE IF jvm.memory.available < 2G THEN 5 requests / second
Expressive conditions such as those displayed above allow the rate limiter to change the shape of traffic dynamically.
Flex Rate Limiter
Flex rate limiter enables engineers to fluently express rate conditions. The rate limiter is then able to dynamically respond to changes in traffic. The flex rate limiter is based on google-guava. However, rate limiting is not locked into the in-built rate limiter. Third-party rate limiters could be used and still enjoy the power and simplicity provided by the **Flex rate limiter. **
Example Usage of Flex Rate Limiter
The flex** rate limiter** allows for limits to be specified using annotations. For example, two permits per second when the user is not logged in AND the JVM available memory is less than 1.5GB.
Java
* *
@Controller
@RequestMapping("/api")
class GreetingResource {
@Rate(permits = 2, condition = "web.request.user.role=GUEST & jvm.memory.available<1.7GB")
@GetMapping("/smile")
String smile() {
return ":)";
}
}
A More Contrived Example of Flex Rate Limiter
Unlike other rate limiters, the **Flex rate limiter **allows rate limiting based on complex conditions. For example:
* *
IF jvm.memory.available < 5G AND user.role = guest THEN 20 requests / second
ELSE IF jvm.memory.available < 3G THEN 10 requests / second
ELSE IF jvm.memory.available < 2G THEN 5 requests / second
The above could be expressed as:
Java
* *
@Controller
@RequestMapping("/api")
class GreetingResource {
@Rate(permits = 20, condition = "jvm.memory.available < 5G & web.request.user.role = GUEST")
@Rate(permits = 10, condition = "jvm.memory.available < 3G")
@Rate(permits = 5, condition = "jvm.memory.available < 1G")
@GetMapping("/smile")
public String smile() {
return ":)";
}
}
Flex Rate Limiter Is Based on Three Major Pillars
- Flexibility: Use of annotations as well as a flexible and expressive language for rate conditions.
- **Evolvability: **Modular design and non-exposure of implementation details.
- **Ease of use: **Minimum setup.
Flexibility
Multiple rates may be specified per class or method. The rates at the class level applies to all methods in the class. Using multiple rate conditions, such as displayed below, allows the rate limiter to dynamically change the shape of traffic. As a result, rate limiting is more responsive, easier to maintain, and boosts developer productivity.
Java
* *
@Rate(10) // 10 permits per second for all methods in this class
@Controller
@RequestMapping("/api/v1")
public class GreetingResource {
@Rate(permits=1, condition="web.request.user.role=GUEST")
@Rate(permits=5, condition="web.request.user.role=USER")
@GetMapping("/smile")
public String smile() {
return ":)";
}
@Rate(permits=5, timeUnit=TimeUnit.MINUTES, condition="sys.memory.available<1gb")
@Rate(permits=2, condition="web.request.parameter={viewOptions$#/profile/}")
@GetMapping("/greet")
public String greet(@RequestParam("who") String who) {
return "Hello " + who;
}
}
Composite rates could be built and re-used multiple times. For example,
the conditional limit below (i.e., LimitIfNotGermany
) may be used
multiple times on different classes/methods.
Java
* *
@Rate(condition = "web.request.locale != [de_DE|de]", permits = 5)
@RateGroup("not-germany")
@Retention(RetentionPolicy.RUNTIME)
@Target({ ElementType.TYPE, ElementType.METHOD, ElementType.ANNOTATION_TYPE})
@interface LimitIfNotGermany{ }
@Controller
@RequestMapping("/api")
class GreetingResource {
@LimitIfNotGermany
@GetMapping("/smile")
String smile() {
return ":)";
}
}
The flexibility offered by the Flex rate
limiter is made
possible by its Rate Condition Expression
Language,
which supports conditions like web.request.cookie=<cookie-name>
using the following tokens and operators:
Tokens
web.request
: attribute, auth.scheme, cookie, header, locale, parameter, remote.address, uri, user.principal, user.roleweb.session
: idjvm.memory
: available, free, max, total, usedjvm.thread.count
: daemon, deadlocked, deadlocked.monitor, peak, startedjvm.thread.current
: count.blocked, count.waited, state, suspended, time.blocked, time.cpu, time.user, time.waited- sys.environment
- sys.property
- sys.time - current, elapsed
Operators
= EQUALS
> GREATER
>= GREATER_OR_EQUALS
< LESS
<= LESS_OR_EQUALS
% LIKE
^ STARTS_WITH
$ ENDS_WITH
! NOT (Negates other operators e.g != or !%)
Evolvability
Flex rate limiter has a modular design, which includes the following modules:
- rate-limiter: Core module. Inspired by guava rate-limiter. Adapted for distributed systems.
- rate-limiter-annotation: Annotation module. *Built on the core module to support annotations. *
- rate-limiter-web-core: Web module. Built on the annotation module to support Java web-based systems.
- rate-limiter-spring: Spring module. *Built on web module, based on Spring framework. *
- rate-limiter-javaee: Javaee module. Built on web module, based on javaee specs.
Rate limiting is not locked into the in-built rate limiter. Third-party rate limiters could be used and still enjoy the power and simplicity provided by annotations and Rate Condition Expression Language.
In addition, to prevent tight coupling. The core modules (i.e., rate-limiter, rate-limiter-annotation, and rate-limiter-web-core) do not expose implementation details.
Ease of Use
Here is how a spring boot application could easily set up rate limiting using the Flex rate limiter.
Java
* *
@SpringBootApplication
@EnableConfigurationProperties(MyApp.MyRateLimitProperties.class)
public class MyApp {
public static void main(String[] args) {
SpringApplication.run(MyApp.class, args);
}
@ConfigurationProperties(prefix = "rate-limiter", ignoreUnknownFields = false)
public class MyRateLimitProperties extends RateLimitPropertiesSpring { }
@Component
public static class MyAppFilter extends ResourceLimitingFilter {
public MyAppFilter(RateLimitProperties properties) {
super(properties);
}
@Override
protected void onLimitExceeded(
HttpServletRequest request, HttpServletResponse response, FilterChain chain) {
response.sendError(429, "Too many requests");
}
}
}
Here are example rate-limit properties:
Java
* *
rate-limiter:
resource-packages: com.myapplicatioon.web.rest
rate-limit-configs:
task_queue: # Accept only 2 tasks per second
permits: 2
duration: PT1S
video_download: # Cap streaming of video to 5kb per second
permits: 5000
duration: PT1S
com.myapplicatioon.web.rest.MyResource: # Limit requests to this resource to 10 per minute
permits: 10
duration: PT1M
Putting It All Together
Bot control mechanisms and CAPTCHA are often used to protect resources. This section will re-imagine such systems with rate-limiting introduced. The aim of introducing rate limiting in general and a dynamic rate limiter, in particular, is to stay ahead of the curve. Instead of spending valuable man-hours fire fighting, developers can focus on what they love doing.
Staying Ahead of the Curve
Today, bots have a variety of tricks up their sleeves, including using multiple user agents, IP addresses, rate-limiting detection, etc. Rate limiting detection often involves sending packets as fast as possible for long enough to trigger rate limiting. Thereafter, requests are sent just under the limit to evade detection as a bot. To counter these tricks, we could use dynamic rate limiting provided by the Flex rate limiter as well as a bot trap.
Dynamic Rate Limiting
Dynamic rate limiting involves triggering rate-limiting conditionally. Using conditions like the client’s IP address is not effective. Flex rate limiter allows rate limiting based on conditions which the client is not privy to, for example, JVM memory state. This prevents the client from detecting rate limiting because the condition for rate limiting changes arbitrarily based on factors outside the client’s control.
Bot-Trap
A bot trap is a link with text hidden from human vision that only bots are able to click/follow. The text could be hidden by giving it the same color as the web page’s background color. Any user who follows the human invisible link is marked as a bot.
Custom Solutions per Use Case
Public Facing Pages
A robust solution would involve dynamic rate limiting to evade rate limiting detection. All requests which exceed the limit (probably bots) are redirected to a CAPTCHA page. The CAPTCHA page contains the bot trap at the very top. Bots click the trap without even attempting the CAPTCHA challenge. This means humans may not need to solve the challenge. Anyone who does not click the bot trap is probably human and is redirected to the desired resource.
- Home page.
- Search page.
- Article details page.
Background Services May Suffer From Traffic Bursts
Rate limiting with traffic smoothing. In this case, requests are not dropped but delayed depending on various conditions. This acts like a queue that is privy to the server memory/responsiveness state.
- Catalog upload.
- Order upload.
Heavy Lifting Services
Plain old vanilla rate limiting. The user gets a limit exceeded when there are too many requests. This should not be surprising to the user as the requested resource in such cases is of a large size.
- Order archive download.
- Other file download services.
Architecture
Comparison
Property | Central control | Side car | Remarks |
---|---|---|---|
Can access state of target server (e.g jvm memory) | Accessing state of target server allows for conditional rate limiting based on important metrics like jvm memory | ||
Rate limiter may serve applications written in other languages | |||
Rate limiter may be scaled independent of application | |||
Ease of implementation and maintenance | Once a filter is setup any application that imports that filter, inherits automatic rate limiting. All that is need is to add annotations and/or properties specifying rates and conditions for limiting resources. | ||
Low latency |
Whereas central control is the more advantageous of the two patterns compared above, its implementation requires the setup of a control plane. A control plane is not trivial to set up. On the other hand, the possible latency issue of the “quasi” sidecar pattern (due to the shared cache) could be mitigated by asynchronous-eventual-rate-limiting.
Improving Latency
Asynchronous-eventual-rate-limiting means the following:
- The call to the shared rate limit cache is made asynchronously. This way, requests are not blocked by the same process (rate-limiting) that was intended to increase latency.
- A major implication of the asynchronous call is that rate-of-use data will not be strongly consistent but rather eventually consistent.
Conclusion
It is easy to design a system that is 95% resilient. However, moving the resiliency dial to 99.99% requires a well-architected system. There are various resiliency mechanisms that control and shape traffic. Whenever such control is lacking or ineffective, developer productivity suffers, as engineers spend more time responding to platform incidences. Existing controls could benefit from rate limiting, as long as rate limiting does not introduce unnecessary complexity or increase response time noticeably. Accordingly, the Flex rate limiter was presented as a suitable option for improving the resilience of distributed systems. Using the Flex rate limiter and the “quasi” sidecar pattern, rate-limiting could be easily set up to protect vulnerable resources. In addition, asynchronous-eventual-rare-limiting could be used to ensure low latency when rate-limiting distributed systems with a shared cache.
Let us drink to the day when this kind of response will be no more, knowing that day may never come, and we may forever be hungover.
References
- Flex rate limiter
- Rate condition expression language: web specification and core specification
- Systems Design