← Back to context

Comment by BarryMilo

2 months ago

I recently witnessed one such potential fuckup. The AI had written functioning code, except one of the business rules was misinterpreted. It would have broken in a few months time and caused a massive outage. I imagine many such time bombs are being deployed in many companies as we speak.

Yeah; I saw a 29,000 line pull request across seventy files recently. I think that realistically 29,000 lines of new code all at once is beyond what a human could understand within the timeframe typically allotted for a code review.

Prior to generative AI I was (correctly) criticized once for making a 2,000 line PR, and I was told to break it up, which I did, but I think thousand-line PRs are going to be the new normal soon enough.

That’s the fault of the human who used the LLM to write the code and didn’t test it properly.

  • Exhaustive testing is hard, to be fair, especially if you don’t actually understand the code you’re writing. Tools like TLA+ and static analyzers exist precisely for this reason.

    An example I use to talk about hidden edge cases:

    Imagine we have this (pseudo)code

      fn doSomething(num : int) {
        if num % 2 == 0 {
          return  Math.sqrt(num)
        } else {
           return Math.pow(num, 2)
        }
    
      }
    

    Someone might see this function, and unit test it based on the if statement like:

        assert(doSomething(4) == 2)
        assert(doSomething(3) == 9)
    

    These tests pass, it’s merged.

    Except there’s a bug in this; what if you pass in a negative even number?

    Depending on the language, you will either get an exception or maybe a complex answer (which not usually something you want). The solution in this particular case would be to add a conditional, or more simply just make the type an unsigned integer.

    Obviously this is just a dumb example, and most people here could pick this up pretty quick, but my point is that sometimes bugs can hide even when you do (what feels like) thorough testing.