Exploring Multi-Tenant SaaS Architecture on AWS
14 December, 2022
AWS offers a wide range of services and tools that are specifically designed for building and running a multi-tenant software-as-a-service (SaaS) application. This includes services for compute, storage, database, analytics, security, and more, which can help speed up development and reduce the time and effort required to build and manage a SaaS application.
With pay-as-you-go pricing, AWS lets you pay only for the services and resources that you actually use. This can help you save money and avoid over-provisioning, as you can easily scale up or down based on the needs of your SaaS application.
In this post, we will explore some of the key considerations for architecting for multi-tenancy, including deployment and isolation models, data-partitioning, and operations. By gaining a deep understanding of the unique nuances which underpin multi-tenant applications, developers and architects can create systems which are robust, scalable, and capable of meeting the needs of a diverse range of users.
What is a "tenant" exactly?
A tenant is a customer that has their own unique instance of the software, with its own configuration and data. This allows each tenant to use the software in a way that is customised to their specific needs and requirements. Within a SaaS application, many tenants co-exist, albeit often with strict isolation boundaries around them.
The words "tenant" and "user" are sometimes used interchangeably, but that is not always the case. A tenant is a logical entity that represents a distinct, isolated namespace within a shared application. This namespace typically includes specific users and data, and it is bounded by strict access controls. Additionally, a tenant is often a billing construct, associated with a particular usage tier and service level agreements (SLAs) that dictate the terms of service. In short, a tenant is a way to organise and manage access to shared resources within a multi-tenant application.
The two halves of SaaS
When building a SaaS application, AWS recommends splitting the system into two distinct planes, the Application Plane and the Control Plane.
Control Plane
The control plane is responsible for coordinating the overall application. As tenants are created, either via self-signup or via an administrator, the control plane is responsible for bootstrapping the tenant within the system.
It's worth mentioning the control plane itself is not necessarily a single monolithic service, but rather a collection of services such as tenant management, identity, metrics, billing and more.
Administrators of the SaaS application have access to the control plane via a dedicated interface not accessible to tenants. This interface may be as simple as a command-line interface or a more fully featured web browser based UI. From here, administrative tasks such viewing and modifying tenant attributes can be performed as well as providing dashboards on how the system is being used at service-wide and tenant-specific levels.
Application Plane
The application plane is a collection of services which make up the bulk of the user-facing application. It is the home of all tenant functionality within the SaaS application.
To effectively manage a multi-tenant application, it is essential that the application is instrumented to emit telemetry data back to the control plane. This allows the control plane to monitor the performance and behaviour of the application in real time, and to identify and address any potential issues or anomalies. To enable this, it is important that all traces and logs generated by the application include the tenant's unique identifier. By properly instrumenting the application and including the tenant's identifier in all telemetry data, it is possible to effectively manage and monitor a multi-tenant application.
Describing SaaS environments
The components of a SaaS application may be composed from wildly different services, and thus their deployment models may vary from service-to-service. The deployment model to choose is often motivated by cost, performance or security.
Silo
A siloed environment is where a tenant gets dedicated resources assigned to them. This may be dedicated compute instances or databases or, in more extreme cases, dedicated VPCs or entire AWS accounts.
Siloed environments typically have increased customer onboarding complexity, as resources will need to be provisioned and configured before a customer can become productive. Because of this increased operational complexity, it is common to see siloed environments reserved only for customers on the highest pricing tiers.
A tradeoff under a siloed model is you miss out on the cost efficiencies of a shared model. Resources may end up being over-provisioned and under utilised, which risks eroding your profit margins. That said, attributing costs to tenants is typically easier under a siloed model than a pooled model (outlined in more detail below). For example, siloed resources can have the owning tenants' id directly attached to resources via tags, which can then be used to generate per-tenant utilisation reports.
Pool
A pooled environment is where tenants share the same underlying resources. This is often easier to manage from an operations point-of-view, as it means there is typically only a single deployment of any given service. For example, if a service needs to be updated, a deployment will affect all tenants sharing the service at once. Contrast this with a siloed approach where updates would need to be rolled out to each siloed environment individually, which may be more time consuming and error prone.
Pooled environments may be more prone to usage contention by βnoisy neighbours'. For example, a tenant performing computationally expensive operations, like a bulk-import, may negatively impact the performance of other tenants which, in turn, reflects poorly on the reputation of you, the SaaS provider. In this case, special attention may need to be given to how you throttle the access of individual tenants. This is where SLAs become important, which are typically tied to the tier each tenant has subscribed to.
Attributing utilisation costs to individual tenants is also non-trivial as it requires custom instrumentation at the software level. On the flip-side, pooled resources have the benefits of economies of scale, simplifying cost and higher operational efficiency. This is particularly useful for customers on lower pricing tiers where utilisation may be intermittent.
Isolation and data partitioning
One of the most important design considerations you must make is ensuring that your tenants are properly isolated from each other. This is especially important in a pooled environment, where multiple customers may share underlying resources.
Each customer's data must always remain secure and protected from unauthorised access, and it's important to ensure that any changes or modifications made by one customer do not affect the data or environment of other customers. Isolation is an important aspect of SaaS, as it allows customers to use the service without worrying about the security or integrity of their data.
Within AWS, Identity Access Management (IAM), provides an excellent mechanism to limit access to resources. This can be achieved by creating dedicated IAM roles during onboarding. These roles would be scoped down to only allow access to resources belonging to the tenant. For example, consider the following policy attached to an IAM role which limits access to a DynamoDB table based on the tenant's id. Note the use of dynamodb:LeadingKeys
in the condition. This limits this role's access to only records stored using the partition key TENANT#12345
(where the tenant's id is 12345).
In more extreme cases, and especially under a siloed model, resources may be isolated into their own VPCs where the network becomes the isolation boundary for the resources. Access in and out of the network is enforced by security groups and network access control lists.
One of the main dangers of not isolating tenant data in a multi-tenant SaaS application is the risk of unauthorised access to sensitive information. If data from different tenants is not properly isolated, it is possible for unauthorised users or applications to gain access to data that they are not supposed to see. This could include personal information, financial data, or other sensitive information that could be used for nefarious purposes.
In addition to these security risks, failing to isolate tenant data can also have negative impacts on the reputation of the application. If tenants are not confident that their data is being properly protected and isolated, they may be less likely to trust the application, and may be more likely to look for alternatives. This could lead to a loss of customers and revenue for the application, and could damage its reputation in the market.
To learn more about isolation within the context of SaaS AWS has several resources on the topic.
SaaS identity
As users authenticate into your app, their identity must be linked to a tenant context. This allows downstream services to correctly scope their access to only the tenant's data. A common approach is to have a user's tenant id encoded as custom claims inside their identity token returned from your identity provider. This identity token is then included as a header in all requests sent to your API endpoints which is, in-turn, used to authorise (or deny) the request. Using AWS API Gateway for example, a custom authoriser Lambda function can be used to validate the identity token, extract the tenant context and make a decision on whether or not to allow the request to proceed.
How you build identity for SaaS varies depending on the protocol used by your chosen identity provider, with SAML and OIDC being popular choices. Authentication in the context of multi-tenant SaaS is a dense topic worthy of its own blog post. Luckily AWS has a lot of prior art on the topic. The AWS blog post Building a Multi-Tenant SaaS Solution Using AWS Serverless Services has some good information on the topic.
Scaling and beyond
If you are just starting out on your SaaS journey on AWS, then the SaaS Lens of the AWS Well-Architected Framework and the SaaS fundamentals whitepaper are excellent resources.
The content of this post has been heavily influenced by the work of the AWS SaaS Factory team, with the AWS re:Invent 2022 talk SaaS architecture patterns: From concept to implementation by Tod Golding being a large inspiration. Tod is an excellent presenter and his talk goes even deeper on some of the topics covered here.
If you are further along on your SaaS journey, I also highly recommend the AWS re:Invent 2022 talk Scaling a SaaS company for public company readiness by Braze CTO Jon Hyman. Jon covers the operational side of running a SaaS business, and how βgoing public' brings a certain level of scrutiny to the reliability of your systems.
Overall, building a SaaS application on AWS can provide a number of benefits, including access to a wide range of services and tools, a global and scalable infrastructure, cost-effective pricing, and a supportive community. This can help to accelerate the development and deployment of your SaaS application, and can make it easier to manage and operate the application over time.