Comment by EFLKumo
5 hours ago
> just like we don’t read assembly, or bytecode, or transpiled JavaScript
This makes sense since certain higher-level code produces certain lower-level code, while LLM cannot. If the transpired JS code doesn't work we could just find out the bug in minifiers, etc. but one cannot figure out why LLM fails at one task, especially considering LLMs, even SOTA ones, could be strongly affected by even small prompt changes. Taking this into consideration, I don't think this is a sound reasoning why we don't need to review ai-generated code.
> The LLMs produce non-deterministic output and generate code much faster than we can read it, so we can’t seriously expect to effectively review, understand, and approve every diff anymore.
Exactly. However, this could also indicate a weaker review standard instead of just dropping review. We could also suggest an idea where devs mainly review code design or interfaces, leveraging one's *taste*, while leaving strict logic reasoning, validating and testing to other tools or approaches. It cannot pursuade me that the nature of LLM's code generation must lead to a complete cancel of the code review.
Anyway, I'm not opposing this article and its thought of shift in the future is really good.
Couldn't we slowly add guardrails that eventually lead to code generation becoming more and more deterministic over time?
I'm seeing in my experience that Claude has become better with every version at producing uniformity in its code output. Especially where the architecture is clear and documented. And even more so in languages with built in uniformity (Go, HTMX, SQL) where there is intentionally only one or two ways of doing things. In such environments, the output is nearly deterministic.