Comment by arthurjj
10 hours ago
LLMs not being lazy enough definitely feels true. But it's unclear to me if it a permanent issue, one that will be fixed in the next model upgrade or just one your agent framework/CICD framework takes care of.
e.g. Right now when using agents after I'm "done" with the feature and I commit I usually prompt "Check for any bugs or refactorings we should do" I could see a CICD step that says "Look at the last N commits and check if the code in them could be simplified or refactored to have a better abstraction"
I've tried this approach of instructing the LLM to look for opportunities to abstract, but it's not good at finding the commonalities after the fact, when possibly related functions have already diverted unnecessarily. It writes "sloppy" code, that is to say code that is locally correct but which fails to build towards overall generalizations, but that sloppy code is a cul-de-sac: easy to write, but adding to messiness, and really tough to improve.
When a good programmer writes a new feature, they are looking for both existing and new abstractions that can be applied. They are considering their mental model of the whole system and examining whether it can be leveraged or needs to be updated. That's how they avoid compounding complications.
In order to take a big picture view like that, the LLM needs the right context. It would need to focus on what its system model is and decide when to update that system model. For now, just telling it what to write isn't enough to get good code. You have to tell it what to pay attention to.
> When a good programmer writes a new feature, they are looking for both existing and new abstractions that can be applied. They are considering their mental model of the whole system and examining whether it can be leveraged or needs to be updated. That's how they avoid compounding complications.
This is actually a pretty good argument that it's a permanent issue. I haven't tried with writing, or having an LLM write, a summary of the coding style of any of my code bases but my hunch is it wouldn't do a good job either writing it or taking it into account when coding a new feature
"Programming as theory building" undefeated still.
It’s difficult to define a termination criterion for that. When you ask LLMs to find any X, they usually find something they claim qualifies as X.
Agreed. If I'm looking at what it proposes then about 1/2 the time I don't make the changes. If this were fully automated you would need an addendum like "Only make the change if it saves over 100 lines of code or removes 3 duplicate pieces of logic".
There are other scenarios you would want to check for but you get the idea.
I agree, it's not a fundamental characteristic but a limitation of how the tool is being used.
If you just tell these things to add, they'll absolutely do that indiscriminately. You end up with these huge piles of slop.
But if I tell an LLM backed harness to reduce LOC and DRY during the review phase, it will do that too.
I think you're more likely to get the huge piles if you delegate a large task and don't review it (either yourself or with an agent).