Yeah, but pub/sub systems already need to be robust to missed messages. And, sending the notify after the transaction succeeds usually accomplishes everything you really care about (no false positives).
For reliability, you can make the recipient poll the table(s) of record for relevant state and use the out-of-band notification channel as a latency-reducer. So, the poller is eventually consistent at some configured polling interval, but opportunistically can respond much sooner when told to check again ahead of the next scheduled poll time.
In my experience, this means you make sure the polling solution is complete and correct, and the notifier gets reduced to a wake-up signal. This signal doesn't even need to carry the actionable change content, if the poller can already pose efficient queries for whatever "new stuff" it needs.
This approach also allows the poller to keep its own persistent cursor state if there is some stateful sequence to how it consumes the DB content. It automatically resynchronizes and the notification channel does not need to be kept in lock-step with the consumption.
... And working outside of the guarantee is harder, especially if you're in a "move fast and break things because we can fix it later" mode.
Anyway, the article indicates that the fix was very simple and primarily in the application layer. Makes me wonder if someone was getting "creative" when they used LISTEN/NOTIFY.
Yeah, but pub/sub systems already need to be robust to missed messages. And, sending the notify after the transaction succeeds usually accomplishes everything you really care about (no false positives).
What happens when transaction succeeds but the execution of NOTIFY fails if it's outside of transaction, in it's own separate connection?
For reliability, you can make the recipient poll the table(s) of record for relevant state and use the out-of-band notification channel as a latency-reducer. So, the poller is eventually consistent at some configured polling interval, but opportunistically can respond much sooner when told to check again ahead of the next scheduled poll time.
In my experience, this means you make sure the polling solution is complete and correct, and the notifier gets reduced to a wake-up signal. This signal doesn't even need to carry the actionable change content, if the poller can already pose efficient queries for whatever "new stuff" it needs.
This approach also allows the poller to keep its own persistent cursor state if there is some stateful sequence to how it consumes the DB content. It automatically resynchronizes and the notification channel does not need to be kept in lock-step with the consumption.
2 replies →
The same thing that happens if the notified process dies suddenly.
If you're not handling that, then whatever you're doing is unreliable either way.
1 reply →
... And working outside of the guarantee is harder, especially if you're in a "move fast and break things because we can fix it later" mode.
Anyway, the article indicates that the fix was very simple and primarily in the application layer. Makes me wonder if someone was getting "creative" when they used LISTEN/NOTIFY.