Comment by brianwawok

9 years ago

> There are many benefits to the transactional integrity you get when using direct connections to a relational database like PostgreSQL. Putting a queue outside of your db breaks ACID compliance

What? Most job queues are inherently to do something outside of the database. For example: I need to send some emails, or I need to resize some images. You cannot wrap sending an email in the same transaction as your queue work, nor can you wrap resizing some images in the same transaction. The ENTIRE reason you are using a message queue is to go to some external work. So this literally makes no sense.

> and may result in silent conflicts or corruption when different services read and update the same data.

You need to program this in one way or another no matter what. If you programmed your code to do "silent conflicts or corruption" then I guess you are going to be in trouble. So don't do that.

> DB transactions eliminate a bunch of the weird and dangerous corner cases you will get with data flows that are performed over multiple network hops having uncertain timing, congestion and reliability.

Again, you are missing the point. MOST job queue work is stuff outside of the database anyway. You still have stuff outside the database.

> The performance and scalability of monolithic dbs is often better than people expect because the data stays closer to CPUs and doesn't move across multiple nodes on a relatively speaking, slow and unreliable network.

Not that relevant. If you are doing 10,000 messages a second (or more!) to your job queue, and are looking to hold open a bunch of transactions, you are going to be in for some pain.

> Trying to add transactional safety after the fact on top of a clustered db/queue is a huge headache and you will never get it as good as transactions that happen inside a single server.

And trying to use PostgreSQL as a job queue is going to give you 1% or .1% of using RabbitMQ or Kafka or SQS or Cloud Pub Sub as a job queue. You are trying too hard to use a database for the wrong thing.

Oh sure if you want to use an outside queue for something simple, one directional and unacknowledged like sending emails that is fine.

People these days want to use queues to send all kinds of event messages multi directionally between systems. They break their ACID and they corrupt their data. At some scales you don't have a choice, but if you can keep all your lower bandwidth stuff happening through a direct connection to postgres you get a more reliable system and it's worth putting some efforts to achieve that. And "lower bandwidth" here is not that low. Postgres scales better than most people think if you put a bit of efforts to optimize.

There are plenty of situations where "good enough" is a better choice, and "not adding yet another tool set or system for our 2 person dev team" has a quality all its own.

There are absolutely situations where it must be both RIGHT and as fast as possible under all loads, and there are situations where it needs to be right and not do WRONG things under abnormal loads. For a startup worrying about handling thousands of transactions can wait until triple digit customers.

> The ENTIRE reason you are using a message queue is to go to some external work.

There are other reasons to use queues besides that, e.g. async communication between two services. This is an example where you could feasibly use a database-backed queue instead. Not saying it's a good or bad idea, depends on the circumstances ofc.

And GPP was only answering a hypothetical and you came out swinging:

> The ENTIRE reason [...] > So this literally makes no sense. > So don't do that.