Comment by kayodelycaon

1 year ago

Crash-only is really hard to implement if another system is involved that isn't crash-only. If you crash in the middle of a network request, you may not know what state the other system is in.

I've had to deal with buggy mainframe software whose error messages had no relation to how much an operation succeeded. (And no way to ask it after the fact...) Welcome to the special hell.

4 comments

kayodelycaon

TheDudeMan 1 year ago

Idempotent APIs + sane timeouts + retries.

WhyNotHugo 1 year ago

Regular software can crash in the middle of a network request too (e.g.: someone accidentally unplugged the wrong network cable, power outage, etc).

Crash-only software is likely to test recovery of such situation.

TheDudeMan 1 year ago

Your comment suggests that you believe crash-only software to be inherently less reliable than the alternative. But that is opposite of the stated goal and supposed benefits.

01HNNWZ0MV43FF 1 year ago

Tbf isn't that equivalent to a network partition and then rebooting or replacing one node? The network will always go down in every middle point of an operation