2PC vs Sagas (distributed transactions)

TransactionsCloudMicroservicesDistributed ComputingSaga

Transactions Problem Overview


I'm developing my insight about distributed systems, and how to maintain data consistency across such systems, where business transactions covers multiple services, bounded contexts and network boundaries.

Here are two approaches which I know are used to implement distributed transactions:

  • 2-phase commit (2PC)
  • Sagas

2PC is a protocol for applications to transparently utilize global ACID transactions by the support of the platform. Being embedded in the platform, it is transparent to the business logic and the application code as far as I know.

Sagas, on the other hand, are series of local transactions, where each local transaction mutates and persist the entities along with some flag indicating the phase of the global transaction and commits the change. In the other words, state of the transaction is part of the domain model. Rollback is the matter of committing a series of "inverted" transactions. Events emitted by the services triggers these local transactions in either case.

Now, when and why would one use sagas over 2PC and vice versa? What are the use cases and pros/cons of both? Especially, the brittleness of sagas makes me nervous, as the inverted distributed transaction could fail as well.

Transactions Solutions


Solution 1 - Transactions

In my understanding (not a big user of 2PC since I consider it limiting):

  • Typically, 2PC is for immediate transactions.
  • Typically, Sagas are for long running transactions.

Use cases are obvious afterwards:

  • 2PC can allow you to commit the whole transaction in a request or so, spanning this request across systems and networks. Assuming each participating system and network follows the protocol, you can commit or rollback the entire transaction seamlessly.
  • Saga allows you split transaction into multiple steps, spanning long periods of times (not necessarily systems and networks).

Example:

  • 2PC: Save Customer for every received Invoice request, while both are managed by 2 different systems.
  • Sagas: Book a flight itinerary consisting of several connecting flights, while each individual flight is operated by different airlines.

I personally consider Saga capable of doing what 2PC can do. Opposite is not accurate.

I think Sagas are universal, while 2PC involves platform/vendor lockdown.

Updates/Additions (optional read):

My answer has been here for a while, and I see that the topic has gained some traction since.

I want to clarify a couple of points on this topic for those who come here and are not sure which route to take.

  1. Saga is a domain modeling (i.e., technology-agnostic) concept, while 2PC is a technology-specific notion with some (maybe many) vendors implementing it. For an analogy, it's the same if we compare the domain events (bare objects) with message brokers (such as RabbitMQ for example).
  2. 2PC can be a good choice if you are anyway married to platforms that implement such a protocol. Not all do, and thus I call this a limitation. I see that people found an argument that Saga is more limiting because it's harder to implement, but that's like saying orange is juicier than apple is sweet. Two different things.
  3. Consider the human factor too. Some people (developers, architects) are technology geeks. They call business logic or domain model a boilerplate code. I belong to another group of people who consider the domain model the most valuable piece of code. Such a preference also affects decisions between Saga and 2PC, as well as who likes what. I can't explain why you should prefer domain-driven thinking over technology-driven solutions because it won't fit on this page and you will abandon reading my answer. Please find more online, maybe through my writings.

@freakish in the comments mentioned a fair point: 2PC prefers consistency, while Saga degrades it to "eventual consistency." If you have a situation where consistency is more important than availability (please read CAP), then maybe you do need a system transaction protocol like 2PC. Otherwise, I recommend going with business transactions such as Saga. Please read System Transactions vs Business Transactions e.g. in PEAA.

Solution 2 - Transactions

Your comparisons are not logically consistent. Older solutions like Sagas take more work to implement that XA/2PC

> Typically, 2PC is for immediate transactions. Typically, Sagas are for > long running transactions.

this is incorrect, XA transactions can run for weeks if you want, no-timeouts are an option. I've worked with systems where XA/2PC run for a week, some where they run for 1ms.

> I personally consider Saga capable of doing what 2PC can do. Opposite is not accurate.

No, Sagas are a more primitive solution to XA. XA is the newer solution. In Sagas boilerplate needs to be developed to handle the transactions. XA moves the common elements of transaction management to the underlying platform, reducing the boiler plate bloat developers have to manage.

> I think Sagas are universal, while 2PC involves platform/vendor > lockdown.

XA spec has been implemented by many vendors and is pretty universal. Implementing 2PC across multiple platforms across multiple organizations has not been a problem for over 30 years.

Solution 3 - Transactions

I'm adding my answer in order to address the main difference between sagas and 2PC which is a consistency model.

> Sagas, on the other hand, are series of local transactions, where each local transaction mutates and persist the entities along with some flag indicating the phase of the global transaction and commits the change.

Interesting description. What exactly this flag is? Is each node supposed to commit changes after the global transaction completes (and this is tracked by this flag)? And each node keeps local changes invisible to the outside until this happens? If that's the case, then how is that different from 2PC? If that's not the case, then what this flag is even for?

Generally, as far as I understand, a saga is a sequence of local transactions. If any of the nodes in the sequence fails then the flow is reversed and each node spawns a compensating transaction in the reversed order.

With this idea however we encounter several issues: the first one is what you've already noticed yourself: what if compensating transactions fail? What if any communcation at any step fails? But there's more, with that approach dirty reads are possible. Say Node1 succeeds and Node2 fails. We then issue a compensating transaction on Node1. But what if some another process reads data after Node1 was updated but before compensating transaction reverts that update? Potential inconsitency (depending on your requirements).

Generally, sagas are: eventually consistent and efficient (no global resource locking) by design. If you have full control over all nodes then saga can be made strongly consistent but that requires a lot of manual (and not obvious, e.g. communication issues) effort, and likely will require some resource locking (and thus we will lose performance). In that case why not use 2PC to begin with?

On the other hand 2PC is strongly consistent by design, which makes it potentially less efficient due to resource locking.

So which one to use? That depends on your requirements. If you need strong consistency then 2PC. If not then saga is a valid choice, potentially more efficient.

Example 1. Say you create an accounting system where users may transfer money between accounts. Say that those accounts live on separate systems. Furthermore you have a strict requirement that the balance should always be nonnegative (you don't want to deal with implicit debts) and maybe a strict requirement that a maximum amount can be set and cannot be exceeded (think about dedicated accounts for repaying debts: you cannot put more money than the entire debt). Then sagas may not be what you want, because due to dirty reads (and other consistency phenomena) we may endup with a balance outside of the allowed range. 2PC will be an easier choice here.

Example 2. Similarly you have an accounting system. But this time a balance outisde of range is allowed (whoever owns the system will deal with that manually). In that scenario perhaps sagas are better. Because manually dealing with a very small number of troublesome states is maybe less expensive then maintaining strong consistency all the time.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTuomas ToivonenView Question on Stackoverflow
Solution 1 - TransactionsTengizView Answer on Stackoverflow
Solution 2 - TransactionsChrisView Answer on Stackoverflow
Solution 3 - TransactionsfreakishView Answer on Stackoverflow