Comment by sgarland
1 year ago
At low-medium scale, this will be fine. Even at higher scale, so long as you monitor autovacuum performance on the queue table.
At some point it may become practical to bring a dedicated queue system into the stack, sure, but this can massively simplify things when you don’t need or want the additional complexity.
Aside from that, the main advantage of this is transactions. I can do:
And it's guaranteed that both the row and job for Elasticsearch update are inserted.
If you use a dedicated queue system them this becomes a lot more tricky:
There are of course also situations where this doesn't apply, but this "insert row(s) in SQL and then queue job to do more with that" is a fairly common use case for queues, and in those cases this is a great choice.
Transactional Outbox solves this. You use a table like in the first example but instead of actually doing the ElasticSearch update the Outbox table is piped into the dedicated queue.
Most of these two phase problems can be solved by having separate queue consumers.
And as far as I can tell, this is only a perk when your two actions are mutate the collocated database and do X. For all other situations this seems like a downgrade.
Do you mean like the consumer for the first phase enqueues a job for the second phase?
I agree, there is no need for FANG level infrastructure. Imo. in most cases, the simplicity / performance tradeoff for small/medium is worth it. There is also a statistics tooling that helps you monitor throughput and failure rats (aggregated on a per second basis)