Comment by randfur

14 hours ago

Do people actually believe these dot points or are they just out of scope for most applications to tackle beyond letting the user try again?

I have had a developer with anger issues expect 100% success with FTP file transfers, and anything that failed was 100% my fault as a Linux/Oracle administrator.

These FTP sessions were running over WANs connecting Pennsylvania, Iowa, and Tennessee.

I ended up writing him an "until curl ftp://...; do echo it failed again; done" loop which calmed that particular issue down.

I don't miss that guy, not even 1%. Good riddance.

Perfect demonstration of the fallacies in action! If you were used to developing applications on a self contained platform you would think something like “sure, if it fails the user can try again”

On a distributed system the user can only try again if the platform has remained stable, the failure is transient (*) and they have (crucially) have been given the information to retry.

The platform that provides a stable environment for the user to just try again has been built on these principles.

(*) there is one administrator assumes it is within the user’s power to resolve the issue

  • >we'll just add this feature on as some async verification since it takes a while, then make the original update wait in some weird state for it to finish.

    Later, when users are confused at failures and weird states. >ok now lets build a new system that tries to gather all this information on updates in "weird states" and let users fix them!

    simplified example, but nightmare.

    • If you’re exposing system concerns mixed in with application code you’re either doing it wrong or using some outdated architecture.

      Either way, it’s no excuses for shipping slop, which is what you’ve done it your software only works under limited idealised circumstances

      TFA is for you