I've just read this article, and I'm confused.
Let's imagine 1 webapp and 1 distinct application acting as "worker", both sharing the same database.
Oh, I said "sharing"..but what does the article warns about? :
Fourthly, sharing a database between applications (or services) is a bad thing. It’s just too tempting to put amorphous shared state in there and before you know it you’ll have a hugely coupled monster.
=> disagree. There are some cases where distinct applications still be part of the same unit, and therefore, notion of "coupling issue" makes no sense in this case.
Let's continue: The webapp handles client HTTP requests and may update at any time some aggregates (DDD term), generating the corresponding domain events.
The goal of the worker would be to handle those domain events by processing the needed jobs.
The point is:
How should events data be passed to the worker?
The first solution, as the read article promotes, would be to use RabbitMQ, being a great message-oriented middleware.
The workflow would be simple:
Any time the web dyno generates an event, it publishes it through RabbitMQ, which feeds the worker.
The drawback would be that nothing guarantees the immediate consistency between the commit of the aggregate update and the publishing of the event, without dealing with the potential sending failures... or hardware issues; that is another main issue.
Example: It would be possible that an event was published without a success of the aggregate update...resulting in an event representing a false representation of the domain model.
You could argue that global XA (two-phase commit) exists, but it's not a solution that fits all databases or middlewares.
So what could be a good solution to ensure this immediate consistency? :
IMO, storing the event in database, in the same local transaction as the aggregate update.
A simple asynchronous scheduler would be created and responsible of querying current unpublished events from database and send them to RabbitMQ, which in turn populates the worker.
But why needing an extra scheduler in webapp side and by the way: why needing RabbitMQ in this case?
By this solution, it appears logically, that the RabbitMQ could be unnecessary, especially because the database is shared.
Indeed, whatever the case, we saw that the immediate consistency involves a polling from database.
Thus, why wouldn't worker be responsible directly for this polling?
Therefore, I wonder why so many articles on the web criticizes hardly database queuing, while promoting message-oriented middleware.
Excerpt of the article:
Simple, use the right tool for the job: this scenario is crying out for a messaging system. It solves all the problems described above; no more polling, efficient message delivery, no need to clear completed messages from queues, and no shared state.
And immediate consistency, ignored ?
To sum up, it really seems that whatever the case is, meaning database shared or not, we need database polling.
Did I miss some critical notions?
Thanks