Rate Limiting Multi-Tenant Environments with the Token Bucket Algorithm on AWS

12 January, 2025

In a multi-tenant SaaS environment, it is often necessary to rate limit certain actions to prevent any single tenant from overwhelming shared resources. This document discusses how the token bucket algorithm works and presents architectures for implementing it using AWS services.

What is the token bucket algorithm?

The token bucket algorithm is a rate-limiting mechanism used to control the flow of data or requests in computer networks and systems. It works by regulating the rate at which consuming processes can perform work.

The token bucket algorithm has two main parameters:

Bucket size: This determines the maximum number of tokens (or burst capacity) that can be held in the bucket at any given time.
Fill rate: This determines the rate at which new tokens are added to the bucket, effectively controlling the long-term average rate at which work can be performed.

To perform work, a consuming process must remove a token from the bucket. If the bucket is not empty, the process can perform the work immediately. However, if the bucket is empty, the process must wait until a new token is added before it can proceed.

The tokens in the bucket are opaque and fungible, meaning that they have no distinct identity or characteristics beyond representing the right to perform a unit of work.

The algorithm gets its name from the analogy of a bucket that can hold a predefined number of tokens. Imagine a token dispenser that drops tokens into the bucket at a constant rate. Each token is like a ticket to a show or event. To perform work, a consuming process needs to hold a token (or ticket) from the bucket. If the bucket is empty, the process can't perform work until a new token is added.

By adjusting the bucket size and fill rate parameters, you can control the burstiness and long-term rate at which work can be performed. A larger bucket size allows for more bursty workloads, while a higher fill rate allows for a higher long-term rate of work.

The token bucket algorithm is commonly used in network traffic shaping, bandwidth management, and rate-limiting scenarios, where it helps to smooth out bursty traffic and prevent resources from being overwhelmed by sudden spikes in demand.

Implementing a token bucket with AWS services

In a multi-tenant SaaS environment, you may need to rate limit certain actions for each tenant to ensure fair usage of shared resources. An SQS queue is a great candidate for implementing a token bucket for this purpose. Queues support multiple disparate consumers, they are highly durable, affordable, and we can create an unlimited number of them.

In a multi-tenant SaaS environment your control plane could provision a queue per tenant/action pairing during tenant onboarding. Each queue could be created following a consistent naming convention like {action}_{tenant-id}_token_bucket. Consumer code will be tenant aware and therefore know how to dynamically select the correct queue to read from.

For example, let's say a new tenant has just signed up and has been assigned tenant id t1234. If your application defines two rate-limited actions: schedule-email (allowing the tenant to schedule a bulk email send to their customers) and invoke-doc-summary-ai (allowing the tenant to create generative AI summaries of uploaded documents), your control plane would provision two new SQS queues named schedule-email_t1234_token_bucket and invoke-doc-summary-ai_t1234_token_bucket.

Producing tokens with EventBridge Scheduler and Lambda

Now that we have our queues (buckets), we need a way to automate the sending of messages (tokens) at the rates we want to limit each action to. EventBridge Scheduler is an AWS service that allows you to create schedules that trigger other AWS services at specified times or intervals. We can use EventBridge Scheduler to trigger a Lambda function at our desired token growth rate.

Each of our token queues (buckets) needs to be filled at the rate we want to limit the associated action to. As SQS queues are effectively uncapped in terms of their maximum allowed size, we also need to control our fill rate in such a way that we don't grow our queues endlessly.

The Lambda function’s job is to iterate over our tenants by fetching data from a data store to determine which access tier they are currently assigned to. Then, the appropriate number of tokens are added to each tenant’s associated token queue based on the growth rate for the specific action within the tier.

Before tokens are sent to the queue the Lambda function must first inspect the current number of tokens in the queue and only add tokens if the queue hasn’t overflowed it’s maximum allowed size. The maximum tokens you allow in the bucket effectively controls burst behaviour. If you allow up to 1000 tokens in your queue then any single consumer can claim all 1000 tokens at once. Or to put it another way, burst throughput until no tokens remain at which point they will be throttled until more tokens become available.

An AWS architecture diagram showing tenant-specific SQS queues with shared token consumer compute — Tenant-specific SQS queues with shared token consumers

An alternative architecture is to create a schedule per tenant/action pair. This allows you to configure the growth rate on the schedule itself as each schedule resource belongs to a single tenant. If a tenant switches tiers, the tenant’s schedules can be adjusted to produce tokens at the rate defined by the tier.

As each tenant gets it’s own schedule we can also do away with the Lambda function and send messages directly to the tenant-owned token queue by configuring our schedule with a universal target which calls the SendMessageBatch API.

As in the previous architecture we still need a way to control the queue length to prevent it growing indefinitely. We can achieve this by configuring the retention period on the queue so that unclaimed messages will automatically expire.

Some caveats with this approach is EventBridge Scheduler can trigger once per minute at most and the SendMessageBatch API can send a maximum of 10 messages per call thus the maximum growth rate with this architecture is 10 messages (tokens) per minute, or 14,400 per day. While it’s possible to have up to 1 million schedules per AWS account (which can be increased via a quota increase request) it’s important to consider that each invocation comes with an associated cost. If each tenant has dozens of schedules and you have many tenants this cost could add up.

An AWS architecture diagram showing tenant-specific EventBridge Schedules and SQS queues — Tenant-specific schedules and SQS queues with shared token consumers

Claiming tokens

The compute which executes our rate-limited actions simply needs to read and delete tokens from our queue to consume them. As mentioned earlier, tokens should ideally be opaque and fungible though encapsulating additional data like a unique token id, the token creation time and the associated action identifier may prove useful from a logging perspective.

Once a consumer has retrieved a token it is free to perform the associated action. It's worth mentioning tokens don’t necessarily need to correspond to a single action one-to-one. A consumer may know ahead of time when specific jobs require more work than others of the same type and therefore claim multiple tokens at once.

Conclusion

The token bucket algorithm is a useful mechanism for rate-limiting actions in a multi-tenant SaaS environment. By implementing token buckets using SQS queues and automatically producing tokens at controlled rates with EventBridge Schedules and Lambda functions (or the SendMessageBatch API), you can ensure fair usage of shared resources across all tenants while still allowing for bursts of activity within defined limits. The architectures presented in this document provide flexible options for implementing token bucket rate limiting tailored to your specific requirements.

Image by kstudio on Freepik