Comment by ensiferum
5 years ago
In practice you always have mutated state in a complex system and not everything moves in transactional steps.
People seem scared to dump core but I find that doing it makes my programs much robust and also simpler. I have no muddled logic to "deal with" bugs in the program itself. They simply will abort the program.
I replied the same in another thread, but why not dump the core of all the processes on the system, instead of just the one that encountered the error? And why stop on this system, and not dump core on other systems that were communicating with this one over a network?
The process is one boundary of isolation, and you are making a bet that the corruption has not crossed this boundry. You can take the same bet with subcomponents of the process, just as in larger systems style may actually prefer to reboot the whole machine or even kill it and spin up another.
This all depends on the architecture and technology you are working with. If a user has input some bad data that I didn't think to validate (user inputs 11 in a page number field, when there are 5 pages in total), an operation is initiated on that data (user presses the 'go' button), and that operation is known to be stateless, when it encounters an error (ArrayIndexOutOfBounds), we can safely abort the operation, log the stack trace, and signal an 'internal server error' to the user without having to kill the whole process.
Not to mention, in a program with many non-transactional state changes, dumping core could be a source of persistent corruption in itself, if another thread was doing something as simple as, for example, writing a JSON file.
Yes, if user entered some bad data, crash dump is not a good option - but OP did separated that into different case, and crash dumps are only suggested for "programmer errors", i.e. case that were not expected during development. And I agree that it's better to crash for such cases, as I cannot write good error handler for cases that I did not envision dduring development anyway.
In my example, we have forgotten to validate user data. We are proceeding with the invalid data as if its trusted, and we get an ArrayIndexOutOfBounds - that is a bug, not user error; we should have told the customer that 11 is not valid, instead we will have to tell them that an unspecified error occurred. If this were C++, it may well lead to memory corruption, so dumping core would almost certainly be the safest thing to do. But if it's Java, we can safely continue with program execution, in this case.
In other cases, even in Java, this may lead to corruption. For example, if we wrote that 11 into some kind of configuration file, we may have just corrupted the system persistently, even if we handle the error. So I'm not claiming that his is always a safe thing to do.
However, dumping core is also not safe a lot of the time. In fact, in multi-threaded processes, it can be argued that it is most likely unsafe to crash [as long as there is any kind of persistent state, such as a file] for the same reason interrupting a thread is inherently unsafe - there is absolutely no guarantee possible for what will happen if a thread is interrupted at an arbitrary point.
1 reply →