← Back to context

Comment by agnishom

5 days ago

I think the key premise here is that one can effectively and efficiently audit code that the LLM is producing.

I doubt that. First, human attention and speed is very limited. Second, when I see something, I am already predisposed to assume that it is right (or at the very least, my subsequent inquiries are extremely narrow and anchored around the solution I have seen presented to me.)

Yeah. Reviewing code thoroughly is extremely time consuming. When you review code from a human, you can look for choices they made fairly quickly - say they used framework X and language feature Y. Most importantly, you assume that they’ve checked certain things to work. This way, code review can be fast but it still isn’t thorough on its own. Most of it is trust and bureaucracy (big companies do this also to prevent malicious employees from smuggling in backdoors etc).

Code from LLMs that looks right, clean and even clever poses as competence but are prone to hallucinations and business logic errors. In the short term, these changes will pass through due to their appearance but contain more issues than a human would have with the same code. In the medium term, we just lose that signal - the assumptions we can make about the authors state of mind and comprehension. It’s already incredibly hard to distinguish solid points from nonsense, when the nonsense is laundered by an LLM.

Yeah the problem is the fatigue of repeatedly reviewing code.

You do a few iterations until code runs, review carefully but notice a bug. So you do another iteration and 40% of code changes. Now you need to review again but you need to understand how the changes fit in.

Repeat this a few times and it becomes very tiring.

Ultimately you can't trust them not to do stupid shit. Your tests fail and you tell it to stop that? Sure, we can just catch those exceptions and the tests pass, etc. You get pissed off an tell it to FIX the CODE so the tests pass and the cycle continues.

It's like working with a potentially gifted moron.

Exactly. And even if LLMs become more accurate and qualitative better, it just raises the costs for validation. Errors and bad design decisions become harder to spot.