Comment by Smudge

9 years ago

Very common in the web context -- you perform some form of relational persistence while also inserting a job to schedule background work (like sending an email). Having those both in the same transaction gets rid of a lot of tricky failure cases.

A transaction in a database does not help you here.

Let's say these are your steps:

1) Open a transaction

2) Claim an email to send

3) Send the email

4) Make email as sent

5) Close transaction

Say your web client crashes between 3 and 4? The email is not going to get marked as sent, and the transaction will rollback. You have no choice but to resend the email.

You could have done this same exact thing with RabbitMQ and an external worker (Celery etc. etc.). You chose to either ack just BEFORE you start the work (between 2 and 3). You will never double send, but risk dropping, or you choose to ack just AFTER you start the work (between 3 and 4), and guarantee to always do the work, but at the risk of a double send.

If your task is idempotent this is super easy, just ack after the work is complete and you will be good. If your task is not idempotent (like sending an email), this takes a bit more work... but I think you have that same exact work in the database transaction example (see above)

  • Email is just one example, but maybe a better example is talking to an API that supports some form of idempotency. You don't want to talk to the API during a transaction that involves other related data persistence, but you can transactionally store the intent to talk to that API (i.e. you insert a job into a queue, and that job will eventually do the API interaction). But even in the email case, you can benefit from a transaction:

    1) Open a transaction

    2) Persist some business/app data

    3) Persist a job into your queue that will send an email relating to #2

    4) Close transaction

    So you've at least transactionally stored the data changes and the intent to send an email. When actually sending the email you probably want something that guarantees at-most-once delivery with some form of cleanup in the failure states (it's a bit more work, as you said).

you can still have the notion of a transaction while integrating rabbitmq like this

you do your mutations in a transaction and also in that transaction you execute a NOTIFY command. If transaction is successful, the notify will go through at the end of the transaction. the notify events and be "bridged" to a messaging server like rabbitmq (see my other comment)

  • Nice -- pg-amqp-bridge seems like a clever solution for coordinating the two. It still puts some amount of the queuing load on the DB, but I'd be comfortable with that trade-off if I really wanted to use rabbitmq as the primary queue.

    Question though -- Does it guarantee at-least-once or at-most-once on the NOTIFY? (Like, if there is a network blip, will it retry the NOTIFY?) And if it is at-least-once, I assume that consumer apps will have to handle deduplication/idempotency.

    • Short answer no, it doesn't (at least now, release 1 week ago :) ), but dismissing it only for that would be a superficial look. Most cases don't need those guarantees, i.e. you can tolerate a few lost messages.

      For example, you are implementing real time updates in your app using this. What's the probability of a network glitch happening at the same time as two users being logged in at the same time in one system where an event produced by one needs to be synced to the other, even more, say he lost that event, is it that critical considering he will soon move to another screen and reload the data entirely?

      From rabbitmq point of view, db&bridge are producers. You are really asking here, does the "producer" guarantee delivery? To do that, it means the producer needs to become himself a "queue" system in case he fails to communicate with rabbit.

      Considering we are talking web here, the producers are usually scripts invoked by a http call so there is no such guarantee in any system (when communication to rabbitmq fails).

      However i think network (in the datacenter) is quite reliable so there is no point in overengineering for that case.

      If the system can tolerate a few seconds of downtime, it's easy enough to implement a heartbeat system which would restart this tool in case it's needed, also, you can run 2-3 of them to make it redundant then use corelationId to dedup the messages.

      A more robust tool would be https://github.com/confluentinc/bottledwater-pg but it's for kafka and the major downside for me is that it cant be used with RDS since it's installed as a plugin to postgresql

    • Notify is very lightweight in postgres, it takes up very little cpu if that's your worry. anyway, this is not meant for log storing/streaming (same as postgresql). My usecase for it is "real time updates" (reasonable amount of events, that need to be routed in complex way to multiple consumers)