← Back to context

Comment by BowBun

1 day ago

Traditional DBs are a poor fit for high-throughput job systems in my experience. The transactions alone around fetching/updating jobs is non-trivial and can dwarf regular data activity in your system. Especially for monoliths which Python and Ruby apps by and large still are.

Personally I've migrated 3 apps _from_ DB-backed job queues _to_ Redis/other-backed systems with great success.

The way that Oban for Elixir and GoodJob for Ruby leverage PostgreSQL allows for very high throughput. It's not something that easily ports to other DBs.

  • Appreciate the added context here, this is indeed some special sauce that challenges my prior assumptions!

  • Interesting. Any docs that explain what/how they do this?

    • A combination of LISTEN/NOTIFY for instantaneous reactivity, letting you get away with just periodic polling, and FOR UPDATE...SKIP LOCKED making it efficient and safe for parallel workers to grab tasks without co-ordination. It's actually covered in the article near the bottom there.

      1 reply →

Transactions around fetching/updating aren't trivial, that's true. However, the work that you're doing _is_ regular activity because it's part of your application logic. That's data about the state of your overall system and it is extremely helpful for it to stay with the app (not to mention how nice it makes testing).

Regarding overall throughput, we've written about running one million jobs a minute [1] on a single queue, and there are numerous companies running hundreds of millions of jobs a day with oban/postgres.

[1]: https://oban.pro/articles/one-million-jobs-a-minute-with-oba...

  • Appreciate the response, I'm learning some new things about the modern listening mechanisms for DBs which unlock more than I believed was possible.

    For your first point - I would counter that a lot of data about my systems lives outside of the primary database. There is however an argument for adding a dependency, and for testing complexities. These are by and large solved problems at the scale I work with (not huge, not tiny).

    I think both approaches work and I honestly just appreciate you guys holding Celery to task ;)

How high of throughput were you working with? I've used Oban at a few places that had what pretty decent throughput and it was OK. Not disagreeing with your approach at all, just trying to get an idea of what kinds of workloads you were running to compare.