Comment by valenterry

1 day ago

The problem is that you now have to poll based on an index (maybe BRIN isn't too bad though) and you have to overwrite the row afterwards and update the index. That means you are creating a dead tuple for every row (and one more if you mark it to be "completed").

1 comment

valenterry

saltcured 6 hours ago

Yes, everything is tradeoffs.

When trying to make good use of RDMBS transactional semantics, I think an important mental shift is to think of there being multiple async processing domains rather than a single magical transaction space. DB transactions are just communication events, not actual business work. This is how the relational DB can become the message broker.

The agents need to do something akin to 2-phase commit protocols to record their "intent" and their "result" across different business resources. But, for a failure-prone, web style network of agents, I would not expose actual DB 2-phase commit protocols. Instead, the relational model reifies the 2-phase-like state ambiguity of particular business resources as tuples, and the agents communicate important phases of their work process with simpler state update transactions.

It's basically the same pattern as with safe use of AMQP, just replacing one queue primitive with another. Both approaches require delayed acknowledgement patterns, so tasks can be routed to an agent but not removed from the system until after the agent reports the work complete. Either approach has an lost or orphaned task hazard if naively written to dequeue tasks earlier in the work process. An advantage of the RDBMS-based message broker is that you can use also use SQL to supervise all the lifecycle state, or even intervene to clean up after agent failures.

In this approach, don't scale-up a central RDMBS by disabling all its useful features in a mad dash for speed. Instead, think of the network of async agents (human or machine) and RDMBS message broker(s) to make for their respective traffic. This agent network and communication workload can often be partitioned to reach scaling goals. E.g. specific business resources might go into different "home" zones with distinct queues and agent pools. Their different lifecycle states do not need to exist under a single, common transaction control.