Comment by ashniu123

1 month ago

For Node.js, my startup used to use [Graphile Worker](https://github.com/graphile/worker) which utilised the same "SKIP LOCKED" mechanism under the hood.

We ran into some serious issues in high throughput scenarios (~2k jobs/min currently, and ~5k job/min during peak hours) and switched to Redis+BullMQ and have never looked back ever since. Our bottleneck was Postgres performance.

I wonder if SolidQueue runs into similar issues during high load, high throughput scenarios...

7 comments

ashniu123

dns_snek 1 month ago

Facing issues with 83 jobs per second (5k/min) sounds like an extreme misconfiguration. That's not high throughput at all and it shouldn't create any appreciable load on any database.

cle 1 month ago
This comes up every time this conversation occurs.
Yes, PG can theoretically handle just about anything with the right configuration, schema, architecture, etc.
Finding that right configuration is not trivial. Even dedicated frameworks like Graphile struggle with it.
My startup had the exact same struggles with PG and did the same migration to BullMQ bc we were sick of fiddling with it instead of solving business problems. We are very glad we migrated off of PG for our work queues.
- dns_snek 25 days ago
  
  The issue is that "83 per second" is multiple orders of magnitude off the expected level of performance on any RDBMS running on anything resembling modern hardware.
  I haven't worked with Graphile but this just doesn't pass the sniff test unless those 83 jobs per second are somehow translating into thousands of write transactions per second.
  Their documentation has a performance section with a benchmark that claims to process 10k jobs per second on a pretty modest machine, as an indication.
  
  1 reply →
vjerancrnjak 1 month ago
They probably did not batch. It’s realistic they will have issues if code is written to handle 1 job at a time and needs to make several roundtrips to the same db inside the same locking transaction.
Leases exist for a reason.
- dns_snek 25 days ago
  
  > if code is written to handle 1 job at a time and needs to make several roundtrips to the same db inside the same locking transaction.
  Do you mean the application code? The worker itself causing the bottleneck is definitely one possibility however if that were the case the issue wouldn't have resolved itself when they switched to a different job queue.
  
  1 reply →