Comment by brianwawok

9 years ago

Why use a database as a queue? It is known this doesn't scale well nor work particularly well. If you need big scale, you can get something like RabbitMQ or Kafka. If you want to avoid server setup, use SQS or Cloud Pub/Sub. If you want it to be lighter weight, use Redis.

This is kind of like an article talking about how most people use a hammer to put in screws wrong. Which is cool, and good for learning why using a hammer to put in screws is a bad idea. But the outcome of this all should be "Use a proper tool for the job", not "try to come up with a neat trick to make this work in the wrong system".

There are many benefits to the transactional integrity you get when using direct connections to a relational database like PostgreSQL. Putting a queue outside of your db breaks ACID compliance and may result in silent conflicts or corruption when different services read and update the same data. DB transactions eliminate a bunch of the weird and dangerous corner cases you will get with data flows that are performed over multiple network hops having uncertain timing, congestion and reliability.

The performance and scalability of monolithic dbs is often better than people expect because the data stays closer to CPUs and doesn't move across multiple nodes on a relatively speaking, slow and unreliable network. Trying to add transactional safety after the fact on top of a clustered db/queue is a huge headache and you will never get it as good as transactions that happen inside a single server.

Of course there are concerns about scalability with a monolith. Manually sharding monoliths is a bit of work but it forces you to do your homework and find the natural and optimal sharding points. In a lot of cases, there are natural separation lines such as between groups of customers that you can use to shard and scale things up. Blindly distributing all data across a bunch of nodes has huge performance and reliability implications. A cpu's connection to it's ram and disks is orders of magnitude more reliable and fast than connections over the network. Unless you are building something like a social network where everything is connected to everything, you don't need things to default to run over clusters and over the network.

  • Also, while queuing systems are preferrable to a database for some of the properties mentioned by GP, they break down for some use cases. For example, for one operation, we needed a time based priority queue that deduplicates on some attributes and refreshes the timestamps. That's where Postgres really shines - and because it's not that many entries and throughput anyway, it was one system less to operate and works perfectly fine to this day.

    Just keep in mind Postgres isn't made for high contention, deletion and mutation rates on really small tables (Uber "Schemaless" case)

  • In addition to all these other benefits, don't forget the impedance match you get in support. If you have a monolithic DB, the interface to all application information is the same, rather than having to learn multiple technologies.

    • + install, configure, deploy, troubleshoot those technologies.

      I think there is a lot of value in using the wrong tool for the job until it becomes "an issue we need to fix".

  • > There are many benefits to the transactional integrity you get when using direct connections to a relational database like PostgreSQL. Putting a queue outside of your db breaks ACID compliance

    What? Most job queues are inherently to do something outside of the database. For example: I need to send some emails, or I need to resize some images. You cannot wrap sending an email in the same transaction as your queue work, nor can you wrap resizing some images in the same transaction. The ENTIRE reason you are using a message queue is to go to some external work. So this literally makes no sense.

    > and may result in silent conflicts or corruption when different services read and update the same data.

    You need to program this in one way or another no matter what. If you programmed your code to do "silent conflicts or corruption" then I guess you are going to be in trouble. So don't do that.

    > DB transactions eliminate a bunch of the weird and dangerous corner cases you will get with data flows that are performed over multiple network hops having uncertain timing, congestion and reliability.

    Again, you are missing the point. MOST job queue work is stuff outside of the database anyway. You still have stuff outside the database.

    > The performance and scalability of monolithic dbs is often better than people expect because the data stays closer to CPUs and doesn't move across multiple nodes on a relatively speaking, slow and unreliable network.

    Not that relevant. If you are doing 10,000 messages a second (or more!) to your job queue, and are looking to hold open a bunch of transactions, you are going to be in for some pain.

    > Trying to add transactional safety after the fact on top of a clustered db/queue is a huge headache and you will never get it as good as transactions that happen inside a single server.

    And trying to use PostgreSQL as a job queue is going to give you 1% or .1% of using RabbitMQ or Kafka or SQS or Cloud Pub Sub as a job queue. You are trying too hard to use a database for the wrong thing.

    • Oh sure if you want to use an outside queue for something simple, one directional and unacknowledged like sending emails that is fine.

      People these days want to use queues to send all kinds of event messages multi directionally between systems. They break their ACID and they corrupt their data. At some scales you don't have a choice, but if you can keep all your lower bandwidth stuff happening through a direct connection to postgres you get a more reliable system and it's worth putting some efforts to achieve that. And "lower bandwidth" here is not that low. Postgres scales better than most people think if you put a bit of efforts to optimize.

    • There are plenty of situations where "good enough" is a better choice, and "not adding yet another tool set or system for our 2 person dev team" has a quality all its own.

      There are absolutely situations where it must be both RIGHT and as fast as possible under all loads, and there are situations where it needs to be right and not do WRONG things under abnormal loads. For a startup worrying about handling thousands of transactions can wait until triple digit customers.

    • > The ENTIRE reason you are using a message queue is to go to some external work.

      There are other reasons to use queues besides that, e.g. async communication between two services. This is an example where you could feasibly use a database-backed queue instead. Not saying it's a good or bad idea, depends on the circumstances ofc.

      And GPP was only answering a hypothetical and you came out swinging:

      > The ENTIRE reason [...] > So this literally makes no sense. > So don't do that.

> Why use a database as a queue?

Already have a central, configured and monitored server and need "just a small queue" for something. This is not per se a bad decision. For the same reason it doesn't have to be a bad idea to cache things in the main database, instead of using a dedicated cache like Redis.

  • In my experience, the cost of adding and maintaining another storage subsystem to a project is often hugely underestimated. It's easy to see the benefits and ignore the costs.

    If I can solve a problem reasonably well by adding a table to my Postgres DB, that will always beat out adding the specialized ScrewdriverDB that does it perfectly.

    • I agree. If it sounds simple, then you are probably not thinking hard enough.

      Think for example how to do backups. If you have a database and completely separate queue system your application state is distributed into two places. This means taking consistent backups is not straightforward. You can of course try to work around this on the application level (database as single source of truth, queues as "nice to have" -thing, but this makes things more complicated for the app).

      1 reply →

    • Depends on what you're adding. Running Redis is dead simple and easy to have visibility into. RabbitMQ / Kafka are much larger undertakings.

      9 replies →

  • This. Also, things like RabbitMQ are complex and their durability properties can be different than those provided by your RDBMS. This can get problematic if you are mixing tasks in the queue that have different priorities. For example, emailing the invoice to a client should not fail silently and should happen at most once. Same with a notification from your doctor that you need to call to discuss your test results. Tossing that into an ephemeral queue is probably not the best solution.

    Having said that, RabbitMQ does have a ton of settings where you can turn durability up/down as much as you want.

    • Email is unreliable. You should number invoices, and tell recipients to ignore duplicates.

  • It also makes it easier to delete or update the queue entry in an atomic transaction that spans other tables. If that has value for the specific use case.

    • Very common in the web context -- you perform some form of relational persistence while also inserting a job to schedule background work (like sending an email). Having those both in the same transaction gets rid of a lot of tricky failure cases.

      8 replies →

  • > Already have a central, configured and monitored server and need "just a small queue" for something.

    Fine. So use SQS or cloud pub sub. Both take 0 "server configuration work", and you aren't adding load to likely the single most expensive part of your infrastructure (RDBMS).

    (The exception to where a RDBMS is not the most expensive part of your infastructure is where you have very large data with either nosql something, or a machine learning GPU array.. but not sure that is super relevant here)

    • That's an entirely valid line of reasoning, but it only applies to a certain set of applications. Further, SQS or whatever have the same drawbacks as other external queues compared to an in-DB queue; see all the sibling comments in this big thread.

  • Yep. Better asked as "Why use a conventional|relational database as a queue?"

    Because all queues have to be databases i.e. have the same qualities / assurances as databases.

Author of PostgreSQL SKIP LOCKED feature here. I gave a talk about queues in PostgreSQL covering the question "why do this in a database", for what it's worth:

https://www.pgcon.org/2016/schedule/track/Applications/929.e...

Mostly the same arguments made by commentators here.

SKIP LOCKED exists in several major RDMSs. We borrowed Oracle's syntax and MySQL has just done the same. The Gray transaction processing book calls it "read past" isolation.

  • "...you like PostgreSQL enough to attend a conference". Future classic right there. (Not being sarcastic, I loved that joke)

In addition to the many good replies:

One thing I like to point out whenever this comes up is that you should aim for a hybrid solution. Don't use a queue to store state, use it to coordinate. Databases are great at dealing with transactional state, queues are good at sequentially handing out data in order.

For example: Say we're generating reports. First you create a table "report_tasks", each row representing a report to be generated. Let each task have a status field and a description of what's to be done or whatever. You can use foreign keys here. Then create the rows, and for each row, publish a queue message with the ID of your row. The queue consumer then needs to read the town by its ID, do the necessary processing and update the status. You can of course also do things like avoid duplicate processing by looking at the status field.

What this solves is introspection/queriability. Queues are for the most part opaque (although Kafka at least keeps the queue around even after processing). This allows you to know exactly what work is pending and what has been completed, and do things like periodically retry the tasks or even preemptively cancel them. With RabbitMQ alone this is much more difficult, since you can't look into the queue without republishing the contents. The best you can do is to funnel ACKed messages into another queue, but you can't see what's pending.

The biggest reason I've found for using a database as a queue is that you can actually run queries against it, both writes and reads. This is very useful when different entries in the queue can have some relationship that matters when processing.

Do you want to wait 30 minutes after an item is queued then process all queued items for a given entity (eg. user) at once? Do you want to support more complex retry logic, such as efficiently handling some subset of jobs failing due to external dependencies being unavailable while others can complete? Do you want to be able to track what jobs are in progress (doesn't really work with skip locked), and even have the ability to update their status while processing for finer grained tracking? Do you want to be able to effectively monitor the contents of the queue beyond just the size (and some queues make it quite inefficient just to get that)?

Some of these are possible with purpose made queueing systems but, in my experience, even if they are they can be quite inefficient, complicated, and are often poorly tested and supported areas of functionality. For example the last time I tried to use JMS selectors with ActiveMQ (selectors let you filter messages based on a subset of SQL) it was a nightmare both performance wise and functionality wise and I wished I just had a database that was actually built to support SQL.

Other points about the cost of introducing a new technology (especially if you need strong durability, availability, etc.) and being able to update your database in the same transaction are also valid, but can often be dealt with by some extra effort.

Don't get me wrong, if you just need a queuing system the first thing to consider is a system designed for that purpose, especially if you aren't very knowledgeable about databases and have basic requirements. It can be nuanced to correctly and efficiently use a general purpose database as queue. At the end of the day, however, keep in mind that under the hood a queueing system is just a database that exposes a limited API that makes it easier to get basic operations right and hopefully easier to scale (sadly I have seen all too many queuing systems that utterly fail at that part) at the expense of flexibility.

Well, maybe because most application don't need/have the scale that require a messaging server? A lot of projects i would say have a 80/20 read/write split, thus, there are not so many events and they don't need the complexity of rabbitmq when the db can handle a few events.

Having said all that, we did actually build a tool to connect together postgresql and rabbitmq :)

https://github.com/subzerocloud/pg-amqp-bridge

When you get to a big number of events or maybe there are lots of types and they all need to be routed in complex ways (real time updates) then indeed, rabbitmq is the right tool And this component enables you to execute NOTIFY in a trigger or stored procedure and have that message sent to rabbitmq.

  • When you have a messaging server.... you can start using it to help parallelize all kinds of tasks. Things you wouldn't previously think about because they'd take days or even weeks to complete, even in parallel, can all run on your infrastructure now. Spinning up 20 or even a 100 new workers is easy, and follows the same principle as just spinning up 1 or 2.

    That's been my experience. Before we did everything in the DB, and implicitly ignored the mere possibility of doing some tasks because we knew it'd be too hard with the DB as the event store.

    • Agreed. To keep it simple day 1, on a single box put your entire stack.

      1 Django web server

      1 redis queue (can both caching and work queue and results queue)

      1 Celery process for async tasks

      1 PostgreSQL instance

      (You can do the same with other stacks, I just know them less well)

      Fairly standard stack. You can find preconfigured setups that can get this going in 10 minutes or less. And boom, you just made a stack that can scale from a 1 user POC to a 10 million user product.

      This lasts until you get past 1 box stage.. and at that point you split it up by function. At that point you can go to 1 celery worker to 1000 celery workers by just pushing a button to make more boxes. Seems a pretty good setup.

      Vs say, trying to skip the redis part (which took all of 4 minutes) and writing a bunch of database logic, and then you have to rewrite it all down the road.

      I am all for start simple. But start simple with a path to the future. If your path to the future is "rip this all out and rewrite it", you should at least ponder what you are doing. Did I REALLY save that much time by using my database as a message queue??

Why would it not work well? Oracle has built their OracleAQ [1] in the database, and it works quite nicely. It can be accessed using native APIs, or a JMS api if that is your thing.

Also, having the queue in the database gives you easy access to the queue for management of all sorts -- SQL gives you full access to the queue data.

[1]: http://www.oracle.com/technetwork/database/oracleadvancedque...

  • Oracle sells the license to use their products at 500k per server. of COURSE they are going to build products to use it. Doesn't matter if it's a subpar choice, they just made 500k per server.

    I am not sure it makes sense to compare some guy throwing some messages in a queue or in postgreSQL to someone spending millions to get steak dinners from Oracle and use Oracle products.

Also Skype created pgpool to use Postgresql as a queue at massive scale. As much it is often called an 'anti-pattern', it is durable and can massively scale way beyond the likely size of most companies, unless they need more scale than Skype.

> Why use a database as a queue?

Quote from the article:

"You don’t need to import a large and complex 3rd party app or library to implement a queue, and you don’t need to deal with the key mapping and namespace issues with advisory locking."

  • That is a strawman though.

    "Why would you need to buy an expensive screwdriver? Your hammer will do a just dandy job of putting in screws"

    • Well, if the hammer does actually do a good enough job of putting in screws for your use case, what's the problem?

      And let's be honest: for most people, it's good enough, scalability-wise. "You are not Google" and all that.

Why would you use Kafka as a queue? It's not a great fit, I mean, you could make it work, but it's a similar square peg in round hole design choice as using a relational DB as a queue.

  • Touche.

    I think you are right. Kafka can work as a queue like a RDBMS can work as a queue, but it's not a great idea.

  • I thought that was one of the primary use cases? Is that not true?

    • Think of Kafka as a replayable append-only log. It's tuned for ingestion of enormous quantities of data; it's best at buffering big data, or perhaps as an infinitely growing log of everything that happens in a system, than simply communication in a distributed system. It doesn't have back pressure and its namespacing is primitive (ie DIY with prefixes, and avoid Confluent libraries that assume they're the only client in the system).

    • Kafka's primary use case is as an intermediary for a stream processing system. e.g. to ingest a firehouse of events, make them durable, then feed them to a number of different processors. More common use cases here: https://kafka.apache.org/uses. You _could_ use it's data model to implement a 'job queue', but it's not a great fit for a number of reasons.

> Why use a database as a queue? It is known this doesn't scale well nor work particularly well.

PostgreSQL scales to 100K writes per second on my laptop.

I can probably get another x10 from a big cloud instance.

You can probably run 99+% of real applications in that range.

Pretty much like Rust users, they advertise Postgres for everything, even usage pattern that don't make sense.

Pretty much like Rust users, they advertise Postgres for everything, even usage pattern that don't make sense.