← Back to context

Comment by lelanthran

17 hours ago

> But I think it totally depends on what your queue is used for.

Agreed

> I deal with it using updates rather than deleting from the queue, because I need a log of what happened for audit purposes. If I need to optimize later, I can easily partition the table. At the start, I just use a partial index for the items to be processed.

That's a good approach. I actually have this type of queue in production, and my need is similar to yours, but the expected load is a lot less - there's an error if the application goes through a day and sees even a few thousand work items added to the queue (this queue is used for user notifications, and even very large clients have only a few thousand users).

So, my approach is to have a retry column that decrements to zero each time I retry a work item, with items having a zero in the retry column getting ignored.

The one worker runs periodically (currently every 1m) and processes only those rows with a non-zero retry column and with the `incomplete` flag set.

A different worker runs every 10m and moves expired rows (those with the retry column set to zero) and completed rows (those with a column set to `done` or similar) to a different table for audit/logging purposes. This is why I said that the table containing workitems will almost never be reindexed: all rows added will eventually be removed.

------------------------------------------------

The real problem is that the processing cannot be done atomically, even when there is only a single worker.

For example, if the processing is "send email", your system might go down after calling the `send_email()` function in your code and before calling the `decrement_retry()` in the code.

No amount of locking, n-phase commits, etc can ever prevent the case where the email is sent but the retry counter is not decremented. This is not a solvable problem so I am prepared to live with it for now with the understanding that the odds are low that this situation will come up, and if it does the impact is lows as well (user gets two notification emails for the same item).