Race conditions in API calls within Golang microservices

Question

I have a microservice architecture running on Heroku. I am having some problems handling race conditions.

The problem is, that service A:

Needs to fetch the user's balance through an API call to service B, then
if balance is large enough, make a withdrawal though an API call to service C.

I have noticed that if service A receives two requests in quick succession (a duplicate request), then the two operations will race to make the call to C. Thus, the user can end up with two withdrawals, and a possible illegal negative balance.

To further complicate things, on this production environment, microservices will generally run on multiple Heroku dynos. So I think this makes it impossible to utilize Golang's sync.Mutex. Potentially, I could consider adjusting this service to use a single dyno to allow the use of mutexes.

Any suggestions on how to handle this?

While it may be thought of as heretical in some places, maybe the answer here is "don't use microservices" - particularly when you've got two of them accessing the same data store. — Philip Kendall, Commented Aug 30, 2020 at 17:43
Do you run a database per node for service C? Or do the nodes actually share the same datasource? Or does a service C perhaps run only on a single node, and separate nodes are reserved for each service type? — Andy, Commented Aug 30, 2020 at 19:11
Also, please, do not cross post. Cross posting leads to scattered knowledge across the network, effectively lowering the overall quality of the content. — Andy, Commented Aug 30, 2020 at 19:14

amon · Accepted Answer · 2020-08-30 19:58:03Z

If you are using a distributed system, you get the drawbacks of distributed system: ensuring consistency of your data model requires real care. TL;DR: avoid distributed systems if consistency is required, or at least make it possible to ignore inapplicable or duplicate messages: include the expected state, and include a unique message ID.

If you have a choice, consider carefully whether this aspect of your process should really be distributed across three different services A, B, C or whether it wouldn't make much more sense to combine these responsibilities into a single service. This will possibly take less development effort than dealing with distributed systems. One variant to this is to have a single service/instance that executes these transactions atomically, and to use a message queue to buffer pending transactions. That's part of how banks do it. A variant is to have a single database that offers locks or ACID transactions to all services.

The next best thing is if you can combine the operatios of B and C into some atomic test-and-set operation. So instead of sending C a message “deduct $100” you would send a message “assuming the balance is still $1234 as last modified at 2020-08-30T17:44:25Z, deduct $100”. This atomically asserts that the state upon which you executed your business logic is the same state onto which you apply your effect. If the state has changed, you might be able to retry the transaction on the new state. However, it is really hard to ensure that all requests will eventually proceed – this only works well if the rate of change to the state is reasonably low.

Another technique is to never maintain a mutable state such as a balance, but to record a stream of modification events that can be replayed to get the current state (event sourcing). In your case, you might trigger an event for C that represents an intention to deduct, but whether the deduction event is actually applied depends on the state of C.

Such events should have an unique ID. If the requests to A also have an unique ID, this would allow A to detect duplicate events. So instead of sending a message “command: do the thing”, it would be better to send “command 62c7b0fd-f7bf-4244-90a9-edc9477301ef: do the thing”. This is a good idea regardless of your service topology, simply because networks are not reliable and might need to retry commands safely.

As a tangential point, consider the relationship between your architecture and your deployment environment. Ideally, your architecture harmonizes with the environment it is deployed in, e.g. lots of loosely coupled services on multiple right-sized nodes, or a big fat monolith on a big fat machine. But if your environment pushes you towards architectures that make it unnecessarily difficult to solve your actual business-level problems, something is deeply wrong. Don't let Heroku dictate your architecture.

Stack Exchange Network

Race conditions in API calls within Golang microservices

1 Answer 1

Hot Network Questions

Race conditions in API calls within Golang microservices

1 Answer 1

Related

Hot Network Questions