Comment by ashniu123

1 month ago

For Node.js, my startup used to use [Graphile Worker](https://github.com/graphile/worker) which utilised the same "SKIP LOCKED" mechanism under the hood.

We ran into some serious issues in high throughput scenarios (~2k jobs/min currently, and ~5k job/min during peak hours) and switched to Redis+BullMQ and have never looked back ever since. Our bottleneck was Postgres performance.

I wonder if SolidQueue runs into similar issues during high load, high throughput scenarios...

Facing issues with 83 jobs per second (5k/min) sounds like an extreme misconfiguration. That's not high throughput at all and it shouldn't create any appreciable load on any database.

  • This comes up every time this conversation occurs.

    Yes, PG can theoretically handle just about anything with the right configuration, schema, architecture, etc.

    Finding that right configuration is not trivial. Even dedicated frameworks like Graphile struggle with it.

    My startup had the exact same struggles with PG and did the same migration to BullMQ bc we were sick of fiddling with it instead of solving business problems. We are very glad we migrated off of PG for our work queues.

    • The issue is that "83 per second" is multiple orders of magnitude off the expected level of performance on any RDBMS running on anything resembling modern hardware.

      I haven't worked with Graphile but this just doesn't pass the sniff test unless those 83 jobs per second are somehow translating into thousands of write transactions per second.

      Their documentation has a performance section with a benchmark that claims to process 10k jobs per second on a pretty modest machine, as an indication.

      1 reply →

  • They probably did not batch. It’s realistic they will have issues if code is written to handle 1 job at a time and needs to make several roundtrips to the same db inside the same locking transaction.

    Leases exist for a reason.

    • > if code is written to handle 1 job at a time and needs to make several roundtrips to the same db inside the same locking transaction.

      Do you mean the application code? The worker itself causing the bottleneck is definitely one possibility however if that were the case the issue wouldn't have resolved itself when they switched to a different job queue.

      1 reply →