Comment by tsimionescu

5 years ago

That is indeed a very common need in a non-memory-safe language.

Still, not all bugs have unbounded impact. As long as memory is not corrupted, thing like off by one errors and null pointer exceptions can often be safely recovered from by simply reverting the operation that hit the error until some kind of safe point (such as the last user interaction, or the last thread start).

Edit: spelling.

"dime kind of safe point"?

That's the whole idea of exceptions. Give up on some chunk of what you were doing, and recover to a previous state.

This works best when you have some notion of a transaction, and can get back to the state before the transaction started. This is what "ROLLBACK" in SQL does.

  • Oops, getting late and typing on mobile... Corrected now, should have been 'some kind of safe point'.

    Yes, that is exactly my point in favor of exceptions, or at least against stopping the whole program with a core dump.

So you're suggesting basically writing by logic to deal with bugs i.e. For cases when the program has failed its own logical constraints? For me a bug is like a division by zero. The program violates its own logic and the only logical conclusion is the termination. I find the fixing is often much simpler and faster to have a loud but bang than some obscure unwanted incorrect result.

  • It often depends on what software you are writing. Writing a web service? Sure, you can kill the process. You only affect that 1 request and the user can retry and other users are unaffected. Writing a server that maintains state for multiple realtime clients? Or a user facing program? If an error can be safely constrained, it can be preferable to log the error but keep the application running.

  • Well, let me ask you this: why not stop the computer when a program encounters a bug? Why not the whole cluster? Theoretically, an attempt to write to a null pointer could happen because you have corrupted a database or file and the entire system is now in an unreliable state.

    The answer is that just as a process has some degree of isolation from other processes on the same system, and from the kernel, similarly components of a process can be well enough isolated that only the specific component that encountered the bug needs to be stopped. This is never 100% safe, but how safe it is depends greatly on the technology and architecture.