Comment by dom0
9 years ago
> Why use a database as a queue?
Already have a central, configured and monitored server and need "just a small queue" for something. This is not per se a bad decision. For the same reason it doesn't have to be a bad idea to cache things in the main database, instead of using a dedicated cache like Redis.
In my experience, the cost of adding and maintaining another storage subsystem to a project is often hugely underestimated. It's easy to see the benefits and ignore the costs.
If I can solve a problem reasonably well by adding a table to my Postgres DB, that will always beat out adding the specialized ScrewdriverDB that does it perfectly.
I agree. If it sounds simple, then you are probably not thinking hard enough.
Think for example how to do backups. If you have a database and completely separate queue system your application state is distributed into two places. This means taking consistent backups is not straightforward. You can of course try to work around this on the application level (database as single source of truth, queues as "nice to have" -thing, but this makes things more complicated for the app).
> This means taking consistent backups is not straightforward.
And they're not even straightforward to begin with...
Depends on what you're adding. Running Redis is dead simple and easy to have visibility into. RabbitMQ / Kafka are much larger undertakings.
If you already had a need for a durable database, and so you properly implemented Postgres streaming-archiving + disaster recovery, at much personal effort for your tiny startup... and you now need a durable queue as well... then "just installing Redis" won't get you there. You'll need to that whole ops setup all over again for Redis. Whereas, if your queue is in Postgres, the ops problem is already solved.
If you have a full-time ops staff, throwing another marginal piece of infrastructure on the pile isn't much of an issue. If you're your own "ops staff of one", each moving part of your infrastructure—that you need to ensure works, to guarantee you won't lose client data—is another thing slowing down your iteration speed.
4 replies →
Depends on the scale mostly. People often forget how not every project is Google/Facebook/Twitter.
RabbitMQ is on docker, its more or less the same work to launch as Redis these days.
(It can take a bit more tuning so I think it is unfair to say it is the SAME work, but it is seriously not a huge deal to run RabbitMQ in the post docker world)
2 replies →
This. Also, things like RabbitMQ are complex and their durability properties can be different than those provided by your RDBMS. This can get problematic if you are mixing tasks in the queue that have different priorities. For example, emailing the invoice to a client should not fail silently and should happen at most once. Same with a notification from your doctor that you need to call to discuss your test results. Tossing that into an ephemeral queue is probably not the best solution.
Having said that, RabbitMQ does have a ton of settings where you can turn durability up/down as much as you want.
Email is unreliable. You should number invoices, and tell recipients to ignore duplicates.
It also makes it easier to delete or update the queue entry in an atomic transaction that spans other tables. If that has value for the specific use case.
Very common in the web context -- you perform some form of relational persistence while also inserting a job to schedule background work (like sending an email). Having those both in the same transaction gets rid of a lot of tricky failure cases.
A transaction in a database does not help you here.
Let's say these are your steps:
1) Open a transaction
2) Claim an email to send
3) Send the email
4) Make email as sent
5) Close transaction
Say your web client crashes between 3 and 4? The email is not going to get marked as sent, and the transaction will rollback. You have no choice but to resend the email.
You could have done this same exact thing with RabbitMQ and an external worker (Celery etc. etc.). You chose to either ack just BEFORE you start the work (between 2 and 3). You will never double send, but risk dropping, or you choose to ack just AFTER you start the work (between 3 and 4), and guarantee to always do the work, but at the risk of a double send.
If your task is idempotent this is super easy, just ack after the work is complete and you will be good. If your task is not idempotent (like sending an email), this takes a bit more work... but I think you have that same exact work in the database transaction example (see above)
1 reply →
you can still have the notion of a transaction while integrating rabbitmq like this
you do your mutations in a transaction and also in that transaction you execute a NOTIFY command. If transaction is successful, the notify will go through at the end of the transaction. the notify events and be "bridged" to a messaging server like rabbitmq (see my other comment)
5 replies →
> Already have a central, configured and monitored server and need "just a small queue" for something.
Fine. So use SQS or cloud pub sub. Both take 0 "server configuration work", and you aren't adding load to likely the single most expensive part of your infrastructure (RDBMS).
(The exception to where a RDBMS is not the most expensive part of your infastructure is where you have very large data with either nosql something, or a machine learning GPU array.. but not sure that is super relevant here)
That's an entirely valid line of reasoning, but it only applies to a certain set of applications. Further, SQS or whatever have the same drawbacks as other external queues compared to an in-DB queue; see all the sibling comments in this big thread.
Yep. Better asked as "Why use a conventional|relational database as a queue?"
Because all queues have to be databases i.e. have the same qualities / assurances as databases.