Comment by hombre_fatal
2 days ago
Yeah, but pub/sub systems already need to be robust to missed messages. And, sending the notify after the transaction succeeds usually accomplishes everything you really care about (no false positives).
2 days ago
Yeah, but pub/sub systems already need to be robust to missed messages. And, sending the notify after the transaction succeeds usually accomplishes everything you really care about (no false positives).
What happens when transaction succeeds but the execution of NOTIFY fails if it's outside of transaction, in it's own separate connection?
For reliability, you can make the recipient poll the table(s) of record for relevant state and use the out-of-band notification channel as a latency-reducer. So, the poller is eventually consistent at some configured polling interval, but opportunistically can respond much sooner when told to check again ahead of the next scheduled poll time.
In my experience, this means you make sure the polling solution is complete and correct, and the notifier gets reduced to a wake-up signal. This signal doesn't even need to carry the actionable change content, if the poller can already pose efficient queries for whatever "new stuff" it needs.
This approach also allows the poller to keep its own persistent cursor state if there is some stateful sequence to how it consumes the DB content. It automatically resynchronizes and the notification channel does not need to be kept in lock-step with the consumption.
> you can make the recipient poll the table(s) of record for relevant state
That is tricky due to transactions and visibility. How do you write the poller to not miss events that were written by a long/blocked transaction? You'd have to set the poller scan to a long time (e.g. "process events that were written since now minus 5minutes") and then make sure transactions are cancelled hard before those 5minutes.
1 reply →
fwiw - that's what Oban did for the most part. It sent a signal to a worker that there was a new job to pick up and work on. At scale, even that was an issue.
The same thing that happens if the notified process dies suddenly.
If you're not handling that, then whatever you're doing is unreliable either way.
98% of developers can't see it