From distributed monolith to microservices on AWS

close up of bolt washers spaced apart on a table

Recently I’ve undertaken a major refactoring of a project which involved breaking up an application into microservices on AWS. The original application consists of several services, each their own serverless framework application.

A major challenge of working on this original codebase was that many of the services "cross-talked" by directly importing code across service boundaries rather than having clear separation of concerns and communicating by asynchronous channels. The application was, by all accounts, a distributed monolith.

Having been exposed to this codebase for some time it became apparent that it was in dire need of a refactor. This involved extracting services from the monorepo and significantly cutting down on cross-service dependencies.

Architecting services

While Serverless framework is excellent at what it does and provides a wide range community authored plugins the core framework is still highly biased towards building RESTful APIs. On the other hand, the AWS CDK (cloud development kit) is more generalised and a great choice for provisioning a wide range of AWS services, not just REST APIs.

Rather than having a mix of Serverless framework and CDK across various services I decided it best to settle on adopting the AWS CDK across all new services. The fact that not all services are exposed via an API was also a factor in this decision.

Breaking apart this existing monolith involved moving services to their own Git repository and converting the Serverless framework code over to the equivalent CDK infrastructure. Each application consists of 2 stacks, one for infrastructure the other for application code.

The infrastructure stack is for creating AWS resources which need to be deployed once per service. For most services this consisted of a single DynamoDB table and an AWS CodePipeline which is responsible for deploying the service’s other stack, the application stack.

The application stack contains a collection of Lambda functions which make up the bulk of the service. Most services contain Lambda functions exposed by an API Gateway and each API Gateway is mounted at a path on a custom domain.

It’s tempting to deploy a single function to multiple paths on an API and use routing logic within the function to determine the matched route, however having dedicated functions per route is a better solution and more in line with SRP. A nice feature of AWS Lambda is each function gets an AWS CloudWatch log group created automatically. Having a single ‘mega function’ which handles multiple routes is less ideal as all logs are collected under the one log group which makes debugging more complicated.


Another challenge of porting the original monolithic application to this newer service oriented architecture is the data. The original application used DynamoDB however it was clearly modelled through the lens of a relational database with one entity type per-table which is a common (and forgivable) way of approaching data storage when first switching to a NoSQL database.

It’s a common best practice when modelling data with DynamoDB to follow a single table design. To anyone with RDBMS experience this is an extremely foreign concept. Single table design involves storing many different entity types within the one table, rather than one type per-table as you would in a traditional relational database.

DynamoDB, like other NoSQL databases, does not have complex query operations like JOIN so data needs to be modelled to optimise for read performance ahead of time. This involves planning and discussions with stakeholders to make sure access patterns are identified early on.

As part of the refactoring efforts on this project data was often re-modelled to better fit this "single table design" way of thinking. Having each service responsible for creating its own table helped create a more clearer separation of concerns. A service only ever reads and writes to its own table, never directly to another service’s table.

Communicating between services

Inter-service communication is handled by AWS EventBridge. As things happen within a service events are dispatched to an event bus shared between all services. Dedicated functions within services are configured with event rules matching the pattern of events dispatched from other services they are interested in hearing from. What I like about this pattern of communication is the services remain decoupled. The service dispatching the event doesn’t need to know about who is listening to the event, it’s not it’s concern. The consuming service only needs to know about the events they are interested in consuming and what to expect in the event payloads.

Each event has a ‘detail-type’ field which contains a unique string describing the type of the event. For this I find the naming convention <service-name>:<action> works well. For example, when a new user is created in the user service an event is dispatched with the detail-type set to "user:created". Inside this event’s detail field I include the record which was just created.

Dispatched events are always in a past tense and services only ever talk about themselves. A service should never send an event destined for another service. To put it another way, a service should never tell another service to do something via an event as this leads to more tightly coupled dependencies between services. If a service needs to invoke an action in another service directly then going via EventBridge is pointless and you would be better off invoking the target function directly.

EventBridge has the ability to publish custom schemas in OpenAPI 3 and JSONSchema Draft4 format for describing the shape of data contained in an event. Once created within EventBridge, code bindings can be downloaded for Java, Python, and TypeScript. This is something we have not implemented on the project but are interested in exploring.

Sharing code across services

Having multiple services means there’s quite a bit of boilerplate code which is common to all services. To prevent having to duplicate code across each service repository there are two options, Lambda Layers or AWS CodeArtifact. I’ve dabbled with Lambda Layers in the past and found it useful for things which are tricky to compile and package like Chromium; however in this instance I found CodeArtifact to be the better choice.

CodeArtifact provides a way for organisations to host their own private package repositories. As Typescript is the primary language used across the organisation this means hosting our own private NPM repository. This allows publishing our own packages to this repository and not the global NPM repository. When logged in other services simply npm install as normal however packages are pulled from this custom repository and not from the public NPM repository.

The CodeArtifact hosted NPM repository acts as its own cache in-front of the public NPM. When you npm install when connected to the private repository CodeArtifact will deliver the package if it has its own copy of it stored. If it doesn’t, it will copy the module from the public NPM repository and store the package for future use.

To set up CodeArtifact I created a dedicated ‘shared services’ AWS account and then created the CodeArtifact domain and resource policy allowing access to other AWS accounts within the same AWS Organisation.

Currently there are 2 shared packages. One contains custom CDK constructs with the main one being a reusable ‘serverless pipeline’ construct which sets up a CI/CD pipeline for a service. The pipeline construct creates an AWS CodePipeline with a service’s Git repository as it’s source and AWS CodeBuild for building and deploying. The construct also includes AWS ChatBot for dispatching build telemetry into our team Slack channels.

The other package is a shared library containing things like Lambda middleware for handling invocations sourced by API Gateway and other useful utility functions.

Future improvements and lessons learned

The AWS services you utilise have their own set of quirks and limitations. For example, EventBridge has an "at least once" guarantee that an event will be delivered to a target matching a rule. In other words, assuming your target is a Lambda function, it will be invoked at least once but might be invoked more-than-once. Depending on your workload that might be something you need to guard against in your implementation. Sending a push notification to a user more than once might not be a show-stopper but charging their credit card more than once probably is!

Knowing where the divisions of your applications are requires considerable thought and you might not always get it right. For example, on this application I ended up creating a separate service for email, push notifications and in-app notifications. One might argue that perhaps those 3 services could have been rolled up into a single ‘notifications’ service and I’m not sure that would be wrong either. In general, my advice would be to start with fewer services and then subdivide only as necessary.

Hi, I'm Will

I'm a lead software engineer with over 15 years experience from Melbourne Australia. Got a project you'd like to discuss? Reach me below.