1

Recently we have this client who asks for a API Gateway solution, in this case it’s Kong. Currently, they have 10 services (200 APIs) that are running on really legacy stuff (built with C++ and Fortran). It’s a pain for third party to integrate, hence they ask for API Gateway as a front to handle authentication, rate limiting, and etc.

However, there’s one thing that I can’t grasp, personally I don’t think they do too. Because of the low QPS, they specifically asked for a Message Queue server that sits between the API Gateway and rest of the services. Reason being we should hold any incoming services in the queue, and release them to the legacy services once they are done with the previous requests. As the diagram below.

architecture-diagram

I think the intention is good, pausing incoming requests to not further burden the legacy services. But given the nature of MQ, doesn’t it mean existing frontend application will need to change the way they handle request/response too? To event based, callback style or socket notification.

And also putting a queue in the middle is essentially adding unnecessary complexity to the entire platform right? Personally, I think what they should do is to rework the existing services, identify the bottleneck and make them scalable.

I've googled quite a bit and I don't find any relevant implementation. Is this an anti-pattern or am I missing something? Please let me know what do you all think. Thanks in advance!

0

2 Answers 2

4

Throttling requests to the backend is in general a reasonable idea if there is evidence that too many parallel requests would lead to problems. However, your suspicion is right, message queues are for asynchronous services, and they don't serve a meaningful function in an API gateway, which is presumably synchronous (unless you are working with "tasks" as the API objects, where the endpoints serve to start up and monitor tasks and fetch results later).

What could be done instead is to implement load-limiting for peak loads in the backend interface of the API gateway(I don't know whether Kong supports that directly, though). When the load limit (measured in parallel in-flight requests for example) is reached, you either delay new incoming requests, or you refuse them altogether with a 502 or 503 http status. However, without sufficient capacity in the backend, this is only usable to buffer quick bursts, incoming loads consistently higher than what the backend can manage will lead to seriously degraded service.

So the proper solution would be to increase capacity of the backend, for example by running parallel instances. How this can be done depends highly on your application and how well the actual service and its database are designed, so this does not come free but should be doable in most cases.

1

This can be done without changing the front-end. What you need to do is have an intermediary which takes the received request, places it on the queue and then when the associated response is received on a response queue, returns it.

I have experience with a design exactly like this and while it can be made to work, there are a number of practical challenges around it.

One of the problems we had with this was that under load, clients (or the API Gateway) may timeout and abandon the request. However, that request will still put load on the back-end systems. Often a client will retry the failed request. This can lead to a downward spiral of requests backing up on the queue(s) where none are satisfied in a timely manner.

You can have the client remove any timeout but that isn't always possible. I don't know Kong but on AWS, the API gateway has a hard limit on how long a request can take. Even if you can do this, it creates another issue if the back-end processing fails and never puts a message on the response queue.

You can have your intermediary respond with an error if a request has already exceeded some threshold but this still adds some load and can delay the processing of other requests in the queue. It can also be tricky to determine what the threshold should be unless the response times on the back-end are highly predictable. For example, it's obvious that if a message is older than the gateway time-out, you should not attempt the transaction but what if the timeout is 30 seconds and you pick up a message that is 29 seconds old?

For these reasons, I would not recommend this approach but YMMV. If possible, I would look to do some sort of automated scaling with e.g.: containers. If there's no way to scale or address the required load directly, I would go with a POST that responds with a URI that supports GET and contains the status and/or response. This, as you note, requires changing the client interaction.

This kind of queue-intermediated design was more common before 64-bit machines were ubiquitous (due to constraints on addressable memory.) Given the technologies involved, this idea might be a hold-over from that time.

2
  • Thanks for sharing. Your explanation is what I have in mind too. I had discussed with the client and they insisted to proceed with this design (without considering any upgrades of their legacy backend). With that being said, containerisation and auto scaling is not possible and we will just have to make do with queue... sigh...
    – Rex Low
    Commented Jan 4, 2023 at 4:23
  • 1
    @RexLow In that case, make sure you build in handling for a backed-up queue and all the other best practices around queues such as poison message handling. Some of the issues we had were due to poor design of the queueing subsystem. It had a 'feature' that would check every single message on the queue to see if it was expired when getting the next message. As the queue backed up, each successive read became slower leading to a bigger backup. I could never figure out why it was built that way. Do your research and make sure you pick a good queuing platform, I guess. Commented Jan 4, 2023 at 17:15

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.