Comment by BarryMilo

2 months ago

I recently witnessed one such potential fuckup. The AI had written functioning code, except one of the business rules was misinterpreted. It would have broken in a few months time and caused a massive outage. I imagine many such time bombs are being deployed in many companies as we speak.

3 comments

BarryMilo

tombert 2 months ago

Yeah; I saw a 29,000 line pull request across seventy files recently. I think that realistically 29,000 lines of new code all at once is beyond what a human could understand within the timeframe typically allotted for a code review.

Prior to generative AI I was (correctly) criticized once for making a 2,000 line PR, and I was told to break it up, which I did, but I think thousand-line PRs are going to be the new normal soon enough.

aurareturn 2 months ago

That’s the fault of the human who used the LLM to write the code and didn’t test it properly.

tombert 2 months ago
Exhaustive testing is hard, to be fair, especially if you don’t actually understand the code you’re writing. Tools like TLA+ and static analyzers exist precisely for this reason.
An example I use to talk about hidden edge cases:
Imagine we have this (pseudo)code
fn doSomething(num : int) { if num % 2 == 0 { return Math.sqrt(num) } else { return Math.pow(num, 2) } }
Someone might see this function, and unit test it based on the if statement like:
assert(doSomething(4) == 2) assert(doSomething(3) == 9)
These tests pass, it’s merged.
Except there’s a bug in this; what if you pass in a negative even number?
Depending on the language, you will either get an exception or maybe a complex answer (which not usually something you want). The solution in this particular case would be to add a conditional, or more simply just make the type an unsigned integer.
Obviously this is just a dumb example, and most people here could pick this up pretty quick, but my point is that sometimes bugs can hide even when you do (what feels like) thorough testing.