← Back to context

Comment by Cerium

9 years ago

A little anecdote describing one such bug. I didn't find this, another one of my teammates did.

The symptom was that a board with a specific microcontroller on it would be working fine, then after a power cycle it might not keep working. A flash dump would show that the reset vector, the first byte of flash on that system, would be all zeroed out. Of course the system would not run anymore, but why did it happen? After months of intermittent debugging and trying to reproduce the cause was determined. At least under certain conditions the brownout detection level was lower than the voltage level that caused the CPU to make errors. If the board lost power slowly then the CPU would start executing corrupted / arbitrary instructions which generally included lots of zeros. It would occasionally write zeros to the zero address, bricking the board.

Since then we have external power monitoring and reset circuits on all the new boards, but existing ones needed a fix. Luckily the board had power failure interrupt connected, so when that triggers we reconfigure the CPU to execute on the slowest possible clock rate, which greatly reduced the occurrence.

Yes, we have seen similar in AVRs (328p) where the lowest voltage brownout threshold/trip is not enough to prevent the EEPROM and/or Flash from getting corrupted somehow...