What is ‘skip locked’ for in PostgreSQL 9.5? (2016)

9 years ago (blog.2ndquadrant.com)

Why use a database as a queue? It is known this doesn't scale well nor work particularly well. If you need big scale, you can get something like RabbitMQ or Kafka. If you want to avoid server setup, use SQS or Cloud Pub/Sub. If you want it to be lighter weight, use Redis.

This is kind of like an article talking about how most people use a hammer to put in screws wrong. Which is cool, and good for learning why using a hammer to put in screws is a bad idea. But the outcome of this all should be "Use a proper tool for the job", not "try to come up with a neat trick to make this work in the wrong system".

  • There are many benefits to the transactional integrity you get when using direct connections to a relational database like PostgreSQL. Putting a queue outside of your db breaks ACID compliance and may result in silent conflicts or corruption when different services read and update the same data. DB transactions eliminate a bunch of the weird and dangerous corner cases you will get with data flows that are performed over multiple network hops having uncertain timing, congestion and reliability.

    The performance and scalability of monolithic dbs is often better than people expect because the data stays closer to CPUs and doesn't move across multiple nodes on a relatively speaking, slow and unreliable network. Trying to add transactional safety after the fact on top of a clustered db/queue is a huge headache and you will never get it as good as transactions that happen inside a single server.

    Of course there are concerns about scalability with a monolith. Manually sharding monoliths is a bit of work but it forces you to do your homework and find the natural and optimal sharding points. In a lot of cases, there are natural separation lines such as between groups of customers that you can use to shard and scale things up. Blindly distributing all data across a bunch of nodes has huge performance and reliability implications. A cpu's connection to it's ram and disks is orders of magnitude more reliable and fast than connections over the network. Unless you are building something like a social network where everything is connected to everything, you don't need things to default to run over clusters and over the network.

    • Also, while queuing systems are preferrable to a database for some of the properties mentioned by GP, they break down for some use cases. For example, for one operation, we needed a time based priority queue that deduplicates on some attributes and refreshes the timestamps. That's where Postgres really shines - and because it's not that many entries and throughput anyway, it was one system less to operate and works perfectly fine to this day.

      Just keep in mind Postgres isn't made for high contention, deletion and mutation rates on really small tables (Uber "Schemaless" case)

    • In addition to all these other benefits, don't forget the impedance match you get in support. If you have a monolithic DB, the interface to all application information is the same, rather than having to learn multiple technologies.

      3 replies →

    • > There are many benefits to the transactional integrity you get when using direct connections to a relational database like PostgreSQL. Putting a queue outside of your db breaks ACID compliance

      What? Most job queues are inherently to do something outside of the database. For example: I need to send some emails, or I need to resize some images. You cannot wrap sending an email in the same transaction as your queue work, nor can you wrap resizing some images in the same transaction. The ENTIRE reason you are using a message queue is to go to some external work. So this literally makes no sense.

      > and may result in silent conflicts or corruption when different services read and update the same data.

      You need to program this in one way or another no matter what. If you programmed your code to do "silent conflicts or corruption" then I guess you are going to be in trouble. So don't do that.

      > DB transactions eliminate a bunch of the weird and dangerous corner cases you will get with data flows that are performed over multiple network hops having uncertain timing, congestion and reliability.

      Again, you are missing the point. MOST job queue work is stuff outside of the database anyway. You still have stuff outside the database.

      > The performance and scalability of monolithic dbs is often better than people expect because the data stays closer to CPUs and doesn't move across multiple nodes on a relatively speaking, slow and unreliable network.

      Not that relevant. If you are doing 10,000 messages a second (or more!) to your job queue, and are looking to hold open a bunch of transactions, you are going to be in for some pain.

      > Trying to add transactional safety after the fact on top of a clustered db/queue is a huge headache and you will never get it as good as transactions that happen inside a single server.

      And trying to use PostgreSQL as a job queue is going to give you 1% or .1% of using RabbitMQ or Kafka or SQS or Cloud Pub Sub as a job queue. You are trying too hard to use a database for the wrong thing.

      4 replies →

  • > Why use a database as a queue?

    Already have a central, configured and monitored server and need "just a small queue" for something. This is not per se a bad decision. For the same reason it doesn't have to be a bad idea to cache things in the main database, instead of using a dedicated cache like Redis.

    • In my experience, the cost of adding and maintaining another storage subsystem to a project is often hugely underestimated. It's easy to see the benefits and ignore the costs.

      If I can solve a problem reasonably well by adding a table to my Postgres DB, that will always beat out adding the specialized ScrewdriverDB that does it perfectly.

      12 replies →

    • This. Also, things like RabbitMQ are complex and their durability properties can be different than those provided by your RDBMS. This can get problematic if you are mixing tasks in the queue that have different priorities. For example, emailing the invoice to a client should not fail silently and should happen at most once. Same with a notification from your doctor that you need to call to discuss your test results. Tossing that into an ephemeral queue is probably not the best solution.

      Having said that, RabbitMQ does have a ton of settings where you can turn durability up/down as much as you want.

      1 reply →

    • It also makes it easier to delete or update the queue entry in an atomic transaction that spans other tables. If that has value for the specific use case.

      9 replies →

    • > Already have a central, configured and monitored server and need "just a small queue" for something.

      Fine. So use SQS or cloud pub sub. Both take 0 "server configuration work", and you aren't adding load to likely the single most expensive part of your infrastructure (RDBMS).

      (The exception to where a RDBMS is not the most expensive part of your infastructure is where you have very large data with either nosql something, or a machine learning GPU array.. but not sure that is super relevant here)

      1 reply →

    • Yep. Better asked as "Why use a conventional|relational database as a queue?"

      Because all queues have to be databases i.e. have the same qualities / assurances as databases.

  • Author of PostgreSQL SKIP LOCKED feature here. I gave a talk about queues in PostgreSQL covering the question "why do this in a database", for what it's worth:

    https://www.pgcon.org/2016/schedule/track/Applications/929.e...

    Mostly the same arguments made by commentators here.

    SKIP LOCKED exists in several major RDMSs. We borrowed Oracle's syntax and MySQL has just done the same. The Gray transaction processing book calls it "read past" isolation.

    • "...you like PostgreSQL enough to attend a conference". Future classic right there. (Not being sarcastic, I loved that joke)

  • In addition to the many good replies:

    One thing I like to point out whenever this comes up is that you should aim for a hybrid solution. Don't use a queue to store state, use it to coordinate. Databases are great at dealing with transactional state, queues are good at sequentially handing out data in order.

    For example: Say we're generating reports. First you create a table "report_tasks", each row representing a report to be generated. Let each task have a status field and a description of what's to be done or whatever. You can use foreign keys here. Then create the rows, and for each row, publish a queue message with the ID of your row. The queue consumer then needs to read the town by its ID, do the necessary processing and update the status. You can of course also do things like avoid duplicate processing by looking at the status field.

    What this solves is introspection/queriability. Queues are for the most part opaque (although Kafka at least keeps the queue around even after processing). This allows you to know exactly what work is pending and what has been completed, and do things like periodically retry the tasks or even preemptively cancel them. With RabbitMQ alone this is much more difficult, since you can't look into the queue without republishing the contents. The best you can do is to funnel ACKed messages into another queue, but you can't see what's pending.

  • The biggest reason I've found for using a database as a queue is that you can actually run queries against it, both writes and reads. This is very useful when different entries in the queue can have some relationship that matters when processing.

    Do you want to wait 30 minutes after an item is queued then process all queued items for a given entity (eg. user) at once? Do you want to support more complex retry logic, such as efficiently handling some subset of jobs failing due to external dependencies being unavailable while others can complete? Do you want to be able to track what jobs are in progress (doesn't really work with skip locked), and even have the ability to update their status while processing for finer grained tracking? Do you want to be able to effectively monitor the contents of the queue beyond just the size (and some queues make it quite inefficient just to get that)?

    Some of these are possible with purpose made queueing systems but, in my experience, even if they are they can be quite inefficient, complicated, and are often poorly tested and supported areas of functionality. For example the last time I tried to use JMS selectors with ActiveMQ (selectors let you filter messages based on a subset of SQL) it was a nightmare both performance wise and functionality wise and I wished I just had a database that was actually built to support SQL.

    Other points about the cost of introducing a new technology (especially if you need strong durability, availability, etc.) and being able to update your database in the same transaction are also valid, but can often be dealt with by some extra effort.

    Don't get me wrong, if you just need a queuing system the first thing to consider is a system designed for that purpose, especially if you aren't very knowledgeable about databases and have basic requirements. It can be nuanced to correctly and efficiently use a general purpose database as queue. At the end of the day, however, keep in mind that under the hood a queueing system is just a database that exposes a limited API that makes it easier to get basic operations right and hopefully easier to scale (sadly I have seen all too many queuing systems that utterly fail at that part) at the expense of flexibility.

  • Well, maybe because most application don't need/have the scale that require a messaging server? A lot of projects i would say have a 80/20 read/write split, thus, there are not so many events and they don't need the complexity of rabbitmq when the db can handle a few events.

    Having said all that, we did actually build a tool to connect together postgresql and rabbitmq :)

    https://github.com/subzerocloud/pg-amqp-bridge

    When you get to a big number of events or maybe there are lots of types and they all need to be routed in complex ways (real time updates) then indeed, rabbitmq is the right tool And this component enables you to execute NOTIFY in a trigger or stored procedure and have that message sent to rabbitmq.

    • When you have a messaging server.... you can start using it to help parallelize all kinds of tasks. Things you wouldn't previously think about because they'd take days or even weeks to complete, even in parallel, can all run on your infrastructure now. Spinning up 20 or even a 100 new workers is easy, and follows the same principle as just spinning up 1 or 2.

      That's been my experience. Before we did everything in the DB, and implicitly ignored the mere possibility of doing some tasks because we knew it'd be too hard with the DB as the event store.

      1 reply →

  • Why would it not work well? Oracle has built their OracleAQ [1] in the database, and it works quite nicely. It can be accessed using native APIs, or a JMS api if that is your thing.

    Also, having the queue in the database gives you easy access to the queue for management of all sorts -- SQL gives you full access to the queue data.

    [1]: http://www.oracle.com/technetwork/database/oracleadvancedque...

    • Oracle sells the license to use their products at 500k per server. of COURSE they are going to build products to use it. Doesn't matter if it's a subpar choice, they just made 500k per server.

      I am not sure it makes sense to compare some guy throwing some messages in a queue or in postgreSQL to someone spending millions to get steak dinners from Oracle and use Oracle products.

  • Also Skype created pgpool to use Postgresql as a queue at massive scale. As much it is often called an 'anti-pattern', it is durable and can massively scale way beyond the likely size of most companies, unless they need more scale than Skype.

  • > Why use a database as a queue?

    Quote from the article:

    "You don’t need to import a large and complex 3rd party app or library to implement a queue, and you don’t need to deal with the key mapping and namespace issues with advisory locking."

  • Why would you use Kafka as a queue? It's not a great fit, I mean, you could make it work, but it's a similar square peg in round hole design choice as using a relational DB as a queue.

  • > Why use a database as a queue? It is known this doesn't scale well nor work particularly well.

    PostgreSQL scales to 100K writes per second on my laptop.

    I can probably get another x10 from a big cloud instance.

    You can probably run 99+% of real applications in that range.

  • Pretty much like Rust users, they advertise Postgres for everything, even usage pattern that don't make sense.

  • Pretty much like Rust users, they advertise Postgres for everything, even usage pattern that don't make sense.

(possibly dumb q, pls be gentle)

Hmmm... What happens if the application crashes immediately after removing the work item but before it can do anything else? Doesn't this break exactly-once semantics... ?

i.e. wouldn't a complete implementation include a second table with "who's working on what" and a transaction that moves the records from one table to the either... and come to think of it, why not store both tables in the same place? I.e. don't delete records, just mark them as taken and include a timestamp so we can timeout and give the work to another worker?

UPDATE queue_table SET taken_by = :myId, time_taken = now() WHERE id = ( SELECT id FROM queue_table WHERE taken_by IS NULL LIMIT 1 FOR UPDATE)

Note: completion can be signalled by setting taken_by to NULL, or but adding another column e.g. completion time, which then enables computing stats on completion times.

For high volume systems, we eventually want to garbage collect but that's easy since we have timestamps, i.e. put a partial index on timestamp (WHERE time_taken IS NOT NULL) and scan the oldest ones...

  • > What happens if the application crashes immediately after removing the work item but before it can do anything else? Doesn't this break exactly-once semantics... ?

    If the connection is broken the transaction would be aborted and the lock released. If the worker hit an infinite loop or something like that you'd use something like `idle_in_transaction_session_timeout` to set a transaction timeout and/or have a worker monitoring system in place to kill long running jobs.

    The important thing to note is that a single DB instance and a client still comprises a distributed system. You still have almost all the same problems with 'exactly-once' semantics you would with a distributed queue. You should make all processing jobs idempotent and support retry regardless of the tech backing the queue if you want a system that provides effectively exactly-once semantics.

    • ah! you're assuming the work is performed inside the same transaction as the dequeue operation, and locks held for the duration ?

      If so...

      While I suppose row level locking technically solves contention, it still feels like we're "asking for trouble" in holding databases locks while clients perform arbitrarily long work operations. There's also practical issues when the work itself is distributed and the original client can't itself keep state around, i.e. it has to end the low level transaction.

      Hence my poor-man's question/proposal using worker IDs and timeouts...

      1 reply →

  • The row lock is held by the RDBMS itself on behalf of the client. When the client goes away, times out (configuration) etc. -- the transaction aborts -- then the RDBMS releases it. You don't get exactly once here, because between performing some action on the work item and marking the item as finished and committing you can still crash. This is an at least once solution in the general case.

    • sorry!! I'm not sure I understand your response? when you say 'You don't get exactly once here' what is does 'here' refer to?

      Upon reflection, true exactly-once semantics requires multi-phase commit, transitively through the system - this sort of tight coupling is tricky in practice.

      Hence my question/proposal, which is practical/poor-man's solution using timeouts.

database rule #1, do not use your database as a work queue.

this article does a great job discussing why not, and finishes with a sane implementation that would work, but would fall over and shutter to a halt with a moderate amount of load. ( which is mentioned in the article too )

doing queue workloads in rdbms is a recipe for index contention, bugs, or both.

  • I've found that co-locating my queue with my application data enables a very powerful pattern around persisting actions and queuing up callbacks -- especially useful for smaller applications where scale isn't as much of a factor. It means I can persist both my data and my background jobs in a single transaction, and if something goes wrong during that transaction they both rollback (before the jobs are picked up by any workers). No need to coordinate two different systems (which can be full of pitfalls).

    • Yes for infrequent yet high value tasks where consistency is paramount, DB queues are king. The alternatives using an externalized queue is either non-transactional without failure handling (yet everybody pretends it is), transactional with 2PC where recovery was never thought through (or tested), or devolves to having the equivalent of a DB queue anyway to track work with re-insertion of work into the primary queue.

  • Define "moderate amount of load".

    More importantly, how many existing Postgres users would fall in that bucket?

    • Right. I use a MySQL-based queue for a moderate amount of load, but that's because it's relatively few tasks that do relatively much work per task. But if your idea of moderate load is hundreds of tasks per second that individually do very little work each, then you may be right.

Locking the task row and doing the work in the same transaction has the nice property, that the task will be available again, if the worker fails and the transaction is rolled back.

Unfortunately this may cause problems when the tasks take considerable time to complete and long running transactions are the result: "Postgres Job Queues & Failure By MVCC" https://brandur.org/postgres-queues

Why not using scheduled timestamped checkpoint synchronization, and then you can get Loose coupling.

I've noticed long ago that RDBMS users don't actually care about consistency, so it's always kind of wrong at some level and when things break it's not that big of a deal for them. It's just annoying to see people claiming how transactions are easy, while they are only easy to do incorrectly.