← Back to context

Comment by thewisenerd

1 day ago

t1: select for update where status=pending, set status=processing

t2: update, set status=completed|error

these are two independent, very short transactions? or am i misunderstanding something here?

--

edit:

i think i'm not seeing what the 'transaction at start of processor' logic is; i'm thinking more of a polling logic

    while true:
      r := select for update
      if r is None:
        return
      sleep a bit

this obviously has the drawback of knowing how long to sleep for; and tasks not getting "instantly" picked up, but eh, tradeoffs.

Your version makes sense. I understood the OP's approach as being different.

Two (very, if indexed properly) short transactions at start and end are a good solution. One caveat is that the worker can die after t1, but before t2 - hence jobs need a timeout concept and should be idempotent for safe retrying.

This gets you "at least once" processing.

> this obviously has the drawback of knowing how long to sleep for; and tasks not getting "instantly" picked up, but eh, tradeoffs.

Right. I've had success with exponential backoff sleep. In a busy system, means sleeps remain either 0 or very short.

Another solution is Postgres LISTEN/NOTIFY: workers listen for events and PG wakes them up. On the happy path, this gets instant job pickup. This should be allowed to fail open and understood as a happy path optimization.

As delivery can fail, this gets you "at most once" processing (which is why this approach by itself it not enough to drive a persistent job queue).

A caveat with LISTEN/NOTIFY is that it doesn't scale due to locking [1].

[1]: https://www.recall.ai/blog/postgres-listen-notify-does-not-s...

  • What are you thoughts on using Redis Streams or using a table instead of LISTEN/NOTIFY (either a table per topic or a table with a compound primary key that includes a topic - possibly a temporary table)?

    • I've not used Redis Streams, but it might work. I've seen folks advise against PG, in favor of Redis for job queues.

      > using a table instead of LISTEN/NOTIFY

      What do you mean? The job queue is backed by a PG table. You could optionally layer LISTEN/NOTIFY on top.

      I've had success with a table with compound, even natural primary keys, yes. Think "(topic, user_id)". The idea is to allow for PARTITION BY should the physical tables become prohibitively large. The downsides of PARTITION BY don't apply for this use case, the upsides do (in theory - I've not actually executed on this bit!).

      Per "topic", there's a set of workers which can run under different settings (e.g. number of workers to allow horizontal scaling - under k8s, this can be automatic via HorizontalPodAutoscaler and dispatching on queue depth!).

They're proposing doing it in one transaction as a heartbeat.

> - If you find an unlocked task in 'executing', you know the processor died for sure. No heuristic needed

  • Yes, and that cannot work: if a task is unlocked but in 'executing' state, how was it unlocked but its state not updated?

    If a worker/processor dies abruptly, it will neither unlock nor set the state appropriately. It won't have the opportunity. Conceptually, this failure mode can always occur (think, power loss).

    If such a disruption happened, yet you later find tasks unlocked, they must have been unlocked by another system. Perhaps Postgres itself, with a killer daemon to kill long-running transactions/locks. At which point we are back to square one: the job scheduling should be robust against this in the first place.