Comment by cle
1 month ago
This comes up every time this conversation occurs.
Yes, PG can theoretically handle just about anything with the right configuration, schema, architecture, etc.
Finding that right configuration is not trivial. Even dedicated frameworks like Graphile struggle with it.
My startup had the exact same struggles with PG and did the same migration to BullMQ bc we were sick of fiddling with it instead of solving business problems. We are very glad we migrated off of PG for our work queues.
The issue is that "83 per second" is multiple orders of magnitude off the expected level of performance on any RDBMS running on anything resembling modern hardware.
I haven't worked with Graphile but this just doesn't pass the sniff test unless those 83 jobs per second are somehow translating into thousands of write transactions per second.
Their documentation has a performance section with a benchmark that claims to process 10k jobs per second on a pretty modest machine, as an indication.
> The issue is that "83 per second" is multiple orders of magnitude off the expected level of performance on any RDBMS running on anything resembling modern hardware.
This is just not true, there are so many scenarios where 83/sec would be the limit. That number by itself is almost meaningless, similar to benchmarks which also make a bunch of assumptions about workloads and runtime environments.
As a simple example if your queue has a large backlog, you have a large worker fleet aggressively pulling work to minimize latency, your payloads are large, you have not optimized indexing, and/or you have many jobs scheduled for the future, every acquire can be an expensive table scan.
(This is a specific example because this is one of many failure scenarios I’ve encountered with Graphile that can cause your DB to meltdown. The same workload in Redis barely causes a blip in Redis CPU, without having to fiddle with indexes and auto vacuuming and worker backoffs.)