← Back to context

Comment by ensiferum

5 years ago

There are 3 separate things that each require their different approach.

- errors i.e bugs made by programmer - logical "error" conditions that the program is expected to handle for example network connection failed or user input failed - unexpected error conditions that typically boild down to resource allocation errors, socket could not be allocated or memory allocation failed etc.

In my experience all of these are best handled with a different tool.

For bugs use an assert that dumps the core and leaves a clear stack trace. For conditions that the program needs to handle use error codes. And finally for the truly unexpected case use exceptions .

Why dump core when you can log the bug and continue? Sure, in development we want things to fail fast and loud, but when deployed with a customer, I don't want my whole program to crash because there is one obscure code path that has a problem.

And even for conditions that the program is expected to handle, 99.9% of the time all it can do is notify the user and ask for guidance, which means that the error must be bubbled up from a networking or storage layer all the way to the presentation layer - a perfect task for exceptions or something like an error monad.

The only problem with exceptions or error monads is that they get tricky in the presence of resources that need to be released, and even that is well handled with patterns like RAII.

  • > Why dump core when you can log the bug and continue?

    I see from your replies what you're trying to say. If an error occurs, most likely you want the entire operation to abort -- that doesn't necessarily mean the whole program depending on the program.

    For example, if I have a GUI app and the "save" operation fails and I typically roll that back right to the event loop of the application and the user gets an error and they can retry the save.

    For other types of applications, killing the whole process is ending the operation.

    • Yes, exactly!

      And on the other end of spectrum, there are even systems where it makes sense to go further than killing a single process, and kill the whole container or even VM where a buggy condition was encountered.

  • > I don't want my whole program to crash because there is one obscure code path that has a problem.

    If that one obscure code path corrupted my state, I want to limit the incorrect actions that the software takes based on that state.

    This "want" of mine is to be balanced with all the other things I want out of the program, and the relative weights will vary by context... but it is often the case that continuing erroneously risks more harm than simply falling over.

    • That is indeed a very common need in a non-memory-safe language.

      Still, not all bugs have unbounded impact. As long as memory is not corrupted, thing like off by one errors and null pointer exceptions can often be safely recovered from by simply reverting the operation that hit the error until some kind of safe point (such as the last user interaction, or the last thread start).

      Edit: spelling.

      5 replies →

  • Because correctness is important to me. I don't want my programs to silently go about in a buggy state producing incorrect results in a corrupted state.

    • Not all bugs put the whole program in a corrupted state. Especially if your prigram is written to be mostly stateless. A common pattern is that a state change is tried, it fails because of a bug or some other error, and it is reverted, and an error shown to a user. I would call this a robust program. Of course, not any error can be recovered this way, but it is very often possible (assuming we are taking about memory-safe languages; otherwise, the balance of probabilities is entirely the other way around).

      5 replies →

For the third case it’s better to just abort. Tell the user to get more RAM or something. What are you supposed to do when you’re out of memory? Catch the exception? Then what?

Related, I always find it funny when C programmers write `if (malloced == NULL) return NULL;` Either you’re going to forget that this can happen and dereference null (in which case it’s just better to abort the program immediately) or the caller will check this and then close the program. If it doesn’t, the next malloc will be null anyways, and the problem repeats. Just call abort().

  • Well memory failure checking is usually put in the "can't do shit" category which isn't necessarily true. Both in C and in C++ bad_alloc or null from mallon indicate that the memory manager could not find the memory. This may or may not mean that your application has overcommitted memory in the OS level. Completely depends on the actual memory manager. So therefore the failure to me is just a general resource allocation failure. Would you dump core of your program failed to allocate a socket ? Or mutex?