Comment by addaon
4 days ago
You can ask it "why", and it gives a probable English string that could reasonably explain why, had a developer written that code, they made certain choices; but there's no causal link between that and the actual code generation process that was previously used, is there? As a corollary, if Model A generates code, Model A is no better able to explain it than Model B.
I think that's right, and not a problem in practice. It's like asking a human why: "because it avoids an allocation" is a more useful response than "because Bob told me I should", even if the latter is the actual cause.
> I think that's right, and not a problem in practice. It's like asking a human why: "because it avoids an allocation" is a more useful response than "because Bob told me I should", even if the latter is the actual cause.
Maybe this is the source of the confusion between us? If I see someone writing overly convoluted code to avoid an allocation, and I ask why, I will take different actions based on those two answers! If I get the answer "because it avoids an allocation," then my role as a reviewer is to educate the code author about the trade-off space, make sure that the trade-offs they're choosing are aligned with the team's value assessments, and help them make more-aligned choices in the future. If I get the answer "because Bob told me I should," then I need to both address the command chain issues here, and educate /Bob/. An answer is "useful" in that it allows me to take the correct action to get the PR to the point that it can be submitted, and prevents me from having to make the same repeated effort on future PRs... and truth actually /matters/ for that.
Similarly, if an LLM gives an answer about "why" it made a decision that I don't want in my code base that has no causal link to the actual process of generating the code, it doesn't give me anything to work with to prevent it happening next time. I can spend as much effort as I want explaining (and adding to future prompts) the amount of code complexity we're willing to trade off to avoid an allocation in different cases (on the main event loop, etc)... but if that's not part of what fed in to actually making that trade-off, it's a waste of my time, no?
Right. I don't treat the LLM like a colleague at all, it's just a text generator, so I partially agree with your earlier statement:
> it's like reviewing a PR with no trust possible, no opportunity to learn or to teach, and no possibility for insight that will lead to a better code base in the future
The first part is 100% true. There is no trust. I treat any LLM code as toxic waste and its explanations as lies until proven otherwise.
The second part I disagree somewhat. I've learned plenty of things from AI output and analysis. You can't teach it to analyze allocations or code complexity, but you can feed it guidelines or samples of code in a certain style and that can be quite effective at nudging it towards similar output. Sometimes that doesn't work, and that's fine, it can still be a big time saver to have the LLM output as a starting point and tweak it (manually, or by giving the agent additional instructions).