Comment by hughdbrown

6 days ago

In my experience, the most pernicious temptation is to take the buggy, non-working code you have now and to try to modify it with "fixes" until the code works. In my experience, you often cannot get broken code to become working code because there are too many possible changes to make. In my view, it is much easier to break working code than it is to fix broken code.

Suppose you have a complete chain of N Christmas lights and they do not work when turned on. The temptation is to go through all the lights and to substitute in a single working light until you identify the non-working light.

But suppose there are multiple non-working lights? You'll never find the error with this approach. Instead, you need to start with the minimal working approach -- possibly just a single light (if your Christmas lights work that way), adding more lights until you hit an error. In fact, the best case is if you have a broken string of lights and a similar but working string of lights! Then you can easily swap a test bulb out of the broken string and into the working chain until you find all the bad bulbs in the broken string.

Starting with a minimal working example is the best way to fix a bug I have found. And you will find you resist this because you believe that you are close and it is too time-consuming to start from scratch. In practice, it tends to be a real time-saver, not the opposite.

The quickest solution, assuming learning from the problem isn't the priority, might be to replace the entire chain of lights without testing any of them. I've been part of some elusive production issues where eventually 1-2 team members attempted a rewrite of the offending routine while everyone else debugged it, and the rewrite "won" and shipped to production before we found the bug. Heresy I know. In at least one case we never found the bug, because we could only dedicate a finite amount of time to a "fixed" issue.

  • > The quickest solution, assuming learning from the problem isn't the priority, might be to replace the entire chain of lights without testing any of them.

    So as a metaphor for software debugging, this is "throw away the code, buy a working solution from somewhere else." It may be a way to run a business, but it does not explain how to debug software.

    • The worst cases in debugging are the ones where the mental model behind the code is wrong and, in those cases, "throw it away" is the way out.

      Those cases are highly seductive because the code seems to work 98% and you'd think it is another 2% to get it working but you never get to the end of the 2%, it is like pushing a bubble around under a rug. (Many of today's LLM enthusiasts will, after they "pay their dues", will wind up sounding like me)

      This is documented in https://www.amazon.com/Friends-High-Places-W-Livingston/dp/0... and takes its worst form when you've got an improperly designed database that has been in production for some time and cannot be 100% correctly migrated to a correct database structure.

  • Depending on how your routine looks like, you could run both on a given input, and see when / where they differ.