Two illustrations showing the choreography and orchestration saga patterns

Coordinating distributed systems with the Saga Pattern on AWS

24 February, 2025

Microservices architecture has revolutionised the way we build and deploy applications, offering improved scalability and flexibility. However, it also introduces new challenges, particularly when it comes to managing distributed transactions across multiple services. Enter the Saga pattern, a crucial design pattern for maintaining data consistency in a microservices environment.

This blog post explores two primary implementations of the Saga pattern: choreography-based and orchestration-based sagas. We'll delve into the pros and cons of each approach, helping you make informed decisions when designing your microservices architecture.

What is the Saga pattern?

The Saga Pattern is a design pattern used in distributed systems to manage complex, long-running transactions that span multiple services. It's particularly useful in microservices architectures where maintaining data consistency across various services can be challenging.

In essence, the Saga Pattern breaks down a large transaction into a series of smaller, local transactions. Each of these local transactions updates data within a single service. The pattern also defines compensating transactions for each step, which can be used to undo changes if any part of the overall transaction fails.

The main goals of the Saga Pattern are to:

Maintain data consistency across services without using distributed transactions
Provide a mechanism for rolling back or compensating when failures occur
Improve system resilience and fault tolerance

By using sagas, systems can achieve eventual consistency, which is often more practical in distributed environments than strict ACID (Atomicity, Consistency, Isolation, Durability) transactions.

Choreography-based vs Orchestration-based

When implementing the Saga Pattern, there are two main approaches: orchestration-based and choreography-based. Each has its own advantages and use cases.

Orchestration-based Sagas

Orchestration-based sagas use a central orchestrator, often called a Saga Execution Coordinator, to manage the transaction flow and direct participating services. This centralised method makes it easier to understand and visualise the overall process. It also simplifies error handling and compensation logic, as these can be managed from a single point. Complex, conditional workflows are generally simpler to implement with this approach.

The trade-offs for orchestration include the potential for the orchestrator to become a single point of failure. This method may also introduce higher coupling, as the orchestrator needs to know about all participating services. Performance can be impacted due to the additional communication required with the orchestrator.

The choice between choreography and orchestration often depends on the complexity of the transaction, the number of services involved, and the specific requirements of the system. Some implementations even use a hybrid approach, combining elements of both styles to leverage the strengths of each method while mitigating their respective weaknesses.

Choreography-based Sagas

In contrast to the orchestration approach, there's no central coordinator. Instead, each service publishes domain events that trigger local transactions in other services. The services react to these events and perform their part of the transaction. This method offers a more decoupled design, as services don't need to know about each other directly. It also provides greater flexibility, making it easier to add new steps to the process. Additionally, the lack of central coordination can lead to improved performance.

However, choreography-based sagas are not without drawbacks. The distributed nature of this approach can make it harder to understand and debug the overall flow. Implementing rollbacks becomes more challenging, as each service needs to know how to compensate for failed transactions. There's also a risk of cyclical event dependencies if the system is not carefully designed.

Choreography-based Sagas with AWS EventBridge

AWS EventBridge is a serverless event bus that enables event-driven architectures by routing events between AWS services, SaaS applications, and custom services.

The EventBridge Advantage

EventBridge enables choreographed sagas through:

Decentralised coordination: Services communicate via events rather than direct API calls
Dynamic service discovery: New consumers can join without code changes to producers
Built-in event routing: Filter and route events using content-based rules
Native schema registry: Maintain event structure consistency across services

Example: An e-commerce order flow might involve:

Order Service emits "OrderCreated"
Inventory Service consumes event → reserves stock → emits "InventoryReserved"
Payment Service reacts → processes payment → emits "PaymentProcessed"
Fulfillment Service finalises → emits "OrderCompleted"

Addressing Choreography Concerns

Debugging Complexity: Use AWS X-Ray trace propagation through event headers
Cyclic Dependencies: Implement event timeouts with dead-letter queues
Compensation Handling: Trigger rollback events like "PaymentFailed" that services must handle
Event Overload: Apply EventBridge archive/replay to manage event storms

When to Choose This Approach

EventBridge excels for sagas requiring:

Organic growth of business processes
Independent service lifecycles
Horizontal scalability of event processors
Broadcast-style notifications (1 event → N consumers)

Orchestration-based Sagas with AWS Step Functions

AWS Step Functions is a workflow orchestration service that coordinates distributed components through state machines defined in Amazon States Language (ASL).

The Step Functions Advantage

Step Functions simplifies orchestrated sagas through:

Centralised visibility: Single execution history for entire transaction
Deterministic workflows: Predefined sequence with error handling
State management: Built-in checkpoints and retry mechanisms
Human-in-the-loop: Direct integration with manual approval steps

Example: Loan application processing:

Initiate application record
Parallel credit checks
Fraud detection analysis
Final approval/rejection
Automated cleanup if any step fails

Addressing Orchestration Concerns

Orchestrator Coupling: Hide service details behind API Gateway endpoints
Long-Running Flows: Use callback patterns for external human approvals
State Machine Bloat: Break complex workflows into nested state machines
Regional Outages: Implement multi-region active/passive deployments

When to Choose This Approach

Step Functions shines for sagas requiring:

Strict sequencing of business-critical operations
Central audit trails for compliance
Complex compensation logic
Mixed automated/human decision points

Conclusion: Choosing Your Saga Strategy

Both choreography and orchestration patterns address distributed transaction challenges, but through fundamentally different lenses:

Choose EventBridge Choreography When:

Your ecosystem requires organic, decentralised growth
Services need full autonomy over their transaction logic
Event broadcasting (1:N relationships) is a core requirement

Opt for Step Functions Orchestration When:

Complex business logic demands centralised control
Strict execution sequencing is non-negotiable
Audit trails and visibility are compliance requirements

Modern systems often combine both approaches:

Use Step Functions for core transactional workflows
Leverage EventBridge for side effects and notifications
Implement hybrid patterns like orchestrated saga chunks with choreographed compensation

On AWS, these patterns become operational rather than theoretical. EventBridge provides the nervous system for reactive architectures, while Step Functions offers the central brain for procedural workflows. By understanding their strengths and tradeoffs, you can design systems that balance flexibility with control – the true art of distributed systems engineering.

Will Dady

Coordinating distributed systems with the Saga Pattern on AWS

What is the Saga pattern?

Choreography-based vs Orchestration-based

Orchestration-based Sagas

Choreography-based Sagas

Choreography-based Sagas with AWS EventBridge

The EventBridge Advantage

Addressing Choreography Concerns

When to Choose This Approach

Orchestration-based Sagas with AWS Step Functions

The Step Functions Advantage

Addressing Orchestration Concerns

When to Choose This Approach

Conclusion: Choosing Your Saga Strategy

Hey there! 👋