API Gateway (REST) + Event-Driven Microservices

Question

I have a bunch of microservices whose functionality I expose through a REST API according to the API Gateway pattern. As these microservices are Spring Boot applications, I am using Spring AMQP to achieve RPC-style synchronous communication between these microservices. Things have been going smooth so far. However, the more I read about event-driven microservice architectures and look at projects such as Spring Cloud Stream the more convinced I become that I may be doing things the wrong way with the RPC, synchronous approach (particularly because I will need this to scale in order to respond to hundreds or thousands of requests per second from client applications).

I understand the point behind an event-driven architecture. What I don't quite understand is how to actually use such an pattern when sitting behind a model (REST) that expects a response to every request. For example, if I have my API gateway as a microservice and another microservice which stores and manages users, how could I model a thing such as a GET /users/1 in a purely event-driven fashion?

Jack · Accepted Answer · 2018-04-18 17:12:17Z

Repeat after me:

REST and asynchronous events are not alternatives. They're completely orthogonal.

You can have one, or the other, or both, or neither. They're entirely different tools for entirely different problem domains. In fact, general purpose request-response communication is absolutely capable of being asynchronous, event-driven, and fault tolerant.

As a trivial example, the AMQP protocol sends messages over a TCP connection. In TCP, every packet must be acknowledged by the receiver. If a sender of a packet doesn't receive an ACK for that packet, it keeps resending that packet until it's ACK'd or until the application layer "gives up" and abandons the connection. This is clearly a non-fault-tolerant request-response model because every "packet send request" must have an accompanying "packet acknowledge response", and failure to respond results in the entire connection failing. Yet AMQP, a standardized and widely adopted protocol for asynchronous fault tolerant messaging, is communicated over TCP! What gives?

The core concept at play here is that scalable loosely-coupled fault-tolerant messaging is defined by what messages you send, not how you send them. In other words, loose coupling is defined at the application layer.

Let's look at two parties communicating either directly with RESTful HTTP or indirectly with an AMQP message broker. Suppose Party A wishes to upload a JPEG image to Party B who will sharpen, compress, or otherwise enhance the image. Party A doesn't need the processed image immediately, but does require a reference to it for future use and retrieval. Here's one way that might go in REST:

Party A sends an HTTP POST request message to Party B with Content-Type: image/jpeg
Party B processes the image (for a long time if it's large) while Party A waits, possibly doing other things
Party B sends an HTTP 201 Created response message to Party A with a Content-Location: <url> header which links to the processed image
Party A considers its work done since it now has a reference to the processed image
Sometime in the future when Party A needs the processed image, it GETs it using the link from the earlier Content-Location header

The 201 Created response code tells a client that not only was their request successful, it also created a new resource. In a 201 response, the Content-Location header is a link to the created resource. This is specified in RFC 7231 Sections 6.3.2 and 3.1.4.2.

Now, lets see how this interaction works over a hypothetical RPC protocol on top of AMQP:

Party A sends an AMQP message broker (call it Messenger) a message containing the image and instructions to route it to Party B for processing, then respond to Party A with an address of some sort for the image
Party A waits, possibly doing other things
Messenger sends Party A's original message to Party B
Party B processes the message
Party B sends Messenger a message containing an address for the processed image and instructions to route that message to Party A
Messenger sends Party A the message from Party B containing the processed image address
Party A considers its work done since it now has a reference to the processed image
Sometime in the future when Party A needs the image, it retrieves the image using the address (possibly by sending messages to some other party)

Do you see the problem here? In both cases, Party A can't get an image address until after Party B processes the image. Yet Party A doesn't need the image right away and, by all rights, couldn't care less if processing is finished yet!

We can fix this pretty easily in the AMQP case by having Party B tell A that B accepted the image for processing, giving A an address for where the image will be after processing completes. Then Party B can send A a message sometime in the future indicating the image processing is finished. AMQP messaging to the rescue!

Except guess what: you can achieve the same thing with REST. In the AMQP example we changed a "here's the processed image" message to a "the image is processing, you can get it later" message. To do that in RESTful HTTP, we'll use the 202 Accepted code and Content-Location again:

Party A sends an HTTP POST message to Party B with Content-Type: image/jpeg
Party B immediately sends back a 202 Accepted response which contains some sort of "asynchronous operation" content which describes whether processing is finished and where the image will be available when it's done processing. Also included is a Content-Location: <link> header which, in a 202 Accepted response, is a link to the resource represented by whatever the response body is. In this case, that means it's a link to our asynchronous operation!
Party A considers its work done since it now has a reference to the processed image
Sometime in the future when Party A needs the processed image, it first GETs the async operation resource linked to in the Content-Location header to determine if processing is finished. If so, Party A then uses the link in the async operation itself to GET the processed image.

The only difference here is that in the AMQP model, Party B tells Party A when the image processing is done. But in the REST model, Party A checks if processing is done just before it actually needs the image. These approaches are equivalently scalable. As the system gets larger, the number of messages sent in both the async AMQP and the async REST strategies increase with equivalent asymptotic complexity. The only difference is the client is sending an extra message instead of the server.

But the REST approach has a few more tricks up its sleeve: dynamic discovery and protocol negotiation. Consider how both the sync and async REST interactions started. Party A sent the exact same request to Party B, with the only difference being the particular kind of success message that Party B responded with. What if Party A wanted to choose whether image processing was synchronous or asynchronous? What if Party A doesn't know if Party B is even capable of async processing?

Well, HTTP actually has a standardized protocol for this already! It's called HTTP Preferences, specifically the respond-async preference of RFC 7240 Section 4.1. If Party A desires an asynchronous response, it includes a Prefer: respond-async header with its initial POST request. If Party B decides to honor this request, it sends back a 202 Accepted response that includes a Preference-Applied: respond-async. Otherwise, Party B simply ignores the Prefer header and sends back 201 Created as it normally would.

This allows Party A to negotiate with the server, dynamically adapting to whatever image processing implementation it happens to be talking to. Furthermore, the use of explicit links means Party A doesn't have to know about any parties other than B: no AMQP message broker, no mysterious Party C that knows how to actually turn the image address into image data, no second B-Async party if both synchronous and asynchronous requests need to be made, etc. It simply describes what it needs, what it would optionally like, and then reacts to status codes, response content, and links. Add in Cache-Control headers for explicit instructions on when to keep local copies of data, and now servers can negotiate with clients which resources clients may keep local (or even offline!) copies of. This is how you build loosely-coupled fault-tolerant microservices in REST.

What it seems you are describing is a polling architecture. The client can start an async processing request on the server. The server goes off and processes the request and finishes processing at some point. In the meantime, the client gets to the point where it wants the processing results. The client tries to get it from the server, but it isn't done, so it comes back again later. That is polling. — rmustakos, Commented Oct 29, 2020 at 17:12

Andy Hunt · Accepted Answer · 2016-06-02 07:49:35Z

2

Whether or not you need to be purely event driven depends, of course, on your specific scenario. Assuming that you really do need to be, then you could solve the problem by:

Storing a local, read-only copy of the data by listening for the different events and capturing the information in their payloads. Whilst this gives you fast(er) reads for that data, stored in a form suitable to that exact application, it also means your data will be eventually consistent across the services.

To model GET /users/1 with this approach, one might listen for the UserCreated and UserUpdated events, and store the useful subset of the users data in the service. When you then need to get that users information, you can simply query your local data store.

For a minute, let's assume that the service which exposes the /users/ endpoint doesn't publish any sort of events. In this instance, you could achieve a similar thing by simply caching responses to the HTTP requests you make, thus negating the need to make more than 1 HTTP request per user within some time frame.

answered Jun 2, 2016 at 7:49

Andy Hunt

6,0363 gold badges36 silver badges58 bronze badges

I understand. But what about error handling (and reporting) to the clients in this scenario?
– Tony E. Stark
Commented Jun 2, 2016 at 9:15
I mean, how do I report back to REST clients errors which occur when handling the UserCreated event (for example, duplicate username or email or database outage).
– Tony E. Stark
Commented Jun 2, 2016 at 10:09
It depends on where you're performing the action. If you're inside the user system, you can do all your validation, writing to data store there, then publish the event. Otherwise, I see it as perfectly acceptable to perform a standard HTTP request to the /users/ endpoint, and allow that system to publish its event if it succeeded, and respond to the request with the new entity
– Andy Hunt
Commented Jun 2, 2016 at 10:40

Add a comment |

Lloyd Moore · Accepted Answer · 2017-09-19 09:45:29Z

With an event sourced system, the asynchronous aspects normally come into play when something that represents state, maybe a database, or an aggregated view of some data, is changed. Using your example, a call to GET /api/users could simply return the response from a service that has an up to date representation of a list of users in the system. In another scenario, the request to GET /api/users could cause a service to use the stream of events since the last snapshot of users to build another snapshot and simply return the results. An event driven system isn't necessarily purely asynchronous from Request to Response, but tends to be at the level where services need to interact with other services. Often it doesn't make sense to asynchronously return a GET request and so you can simply return the response of a service, regardless of how that response is computed.

Stack Exchange Network

API Gateway (REST) + Event-Driven Microservices

3 Answers 3

Hot Network Questions

API Gateway (REST) + Event-Driven Microservices

3 Answers 3

Related

Hot Network Questions