Comment by benterix

7 days ago

My lessons so far:

1. Less fun.

2. A lot of more "review fatigue".

3. Tons of excess code I'd never put in there in the first place.

4. Frustration with agents being too optimistic which with time verges on the ludicurous ("Task #3 has been completed successfully with 98% tests failing. [:useless_emojis:]")

5. Frustration with agents routinely getting down a rabbit hole or going in circles, the effort needed to get that straight (Anthropic plainly advises to start from scratch in such cases - which is sound advice, but makes me feel like I just lost the last 5 hours of my life without even learning anything new).

I stopped using agents and use LLMs very sparingly (e.g. for review - they sometimes find some details I missed and occasionally have an interesting solution) but I'm enjoying my work so much more without them.

21 comments

benterix

resonious 7 days ago

I think one of the tricks is to just stop using the agent as soon as you see signs of funny business. If it starts BSing me with failing tests, I just turn it off immediately and git reset (maybe after taking a quick peek)

Sammi 7 days ago

Yeah I make maybe two or three attempts at getting it to write a plan that it is able to follow coherently. But after that I pull the escape hatch and *gasp* program by hand.
I've made this mistake of doubling down after a few initial failures to solve an issue, by trying to make this super duper comprehensive and highly detailed and awesome plan that it will finally be able to implement correctly. But it just gets worse and worse the more I try, because it fundamentally is not understanding what is going on, so it will inevitably find an opportunity to go massively off rails, and the further down you lead it the more impressible the derailment will be.
My experience is that going around in endless circles with the model is just a waste of time when you could have just done it yourself in the time you've wasted.

rhetocj23 7 days ago

One thing I don’t get - If you spend much of your time reviewing, you’re just reading - you’re not actually doing anything - you’re passive in the activity of code production. By extension you will become worse at knowing what a good standard of code is and become worse at reviewing code.

I’m not a SWE so I have no interests to protect by criticising what is going on.

llamatastic 7 days ago

In my DJing years I've learned that it is best to provide a hot signal and trim the volume than trying to amplify it later, because you end up amplifying noise. Max out the mixer volume and put a compressor (and a limiter after to protect the speaker set up - it will make it sound awful if hit, but it won't damage your set up and it will flag clueless bozos loud and clear) later, don't try to raise it after it leaves the origin.
It seems to me that adding noise to the process and trying to cut it out later is a self defeating proposition. Or as Deming put it, (paraphrasing) you can't QC quality into a bad process.
I can see how it seems better to "move fast and break things" but I will live and die by the opposite "move slow and fix things". There's much, much more to life than maximizing short term returns over a one dimensional naïve utilitarian take on value.
executesorder66 7 days ago
Tell that to Linus Torvalds.
His whole job is just doing code review, and I'd argue he's better at coding now than he ever was before.
- benterix 7 days ago
  
  I'd be careful with extrapolating based on the creator of Linux and Git. His life and activities are not in line with those of more typical programmers.
  
  3 replies →
- walleeee 6 days ago
  
  It's not only that Linus is atypical, it's also that he is reviewing other people's code, and those people are also highly competent, or they would not be kernel committers. And they all share large amounts of high-quality and hard-earned implicit context.
  Reviewing well executed changesets by skilled developers on complicated and deliberate projects is not comparable to "fleet of agents" vibe engineering. One of these tasks will sharpen you regardless how lazily you approach it. The other requires extreme discipline to avoid atrophy.
- nkrisc 7 days ago
  
  Linus Torvalds is hardly typical.
squidbeak 7 days ago

I've never found code reviews degrade the reviewer's standards. Just the opposite.

lukevp 6 days ago

I reset context probably every 5-10 minutes if not more frequently, and commit even more often than that. If you’re going 5 hours between commits or context resets, I’m not surprised you’re getting bad results. If you ever see “summarizing”’in copilot for example, that means you went way too far in that context window. The LLMs get increasingly inaccurate and confused as the context window fills up.

Other things like having it pull webpages in, will totally blow away your context. It’s better to make a separate context just to pull a webpage down and summarize it in markdown and then reset context.

Huppie 6 days ago
The 'best' trick I learned from someone over here when working with Claude Code is to very regularly go back a few steps in your context (esc esc -> pick something a few steps up) and say something like "yeah, I already did this myself, now continue and do Y"
It results helps keep the context clean while still keeping the initial context I provided (usually with documentation and initial plan setup) at the core of the context.
Now that you say this, I did notice webpages blow context but didn't think too much of it just yet, maybe there's some improvement to be found here using a subagent? I'm not a big fan of subagents (didn't really get proper results out of them in my initial experiments anyway) but maybe adding a 'web researcher' sub agent that summarizes to a concise markdown file could help here.
- conesus 6 days ago
  
  Now that's dangerous to do because the conversation history in Claude Code now also reverts the code to that point. So while this technique may have worked in the past, it no longer works.
  
  1 reply →

jrs235 7 days ago

Regarding #3. I feel it's related to this idea: We can build a wood frame house with 2x4's or toothpicks. AI directed and generated code today tends to build things overly complex with more pieces than necessary. I feel like an angry foreman yelling at AI to fix this, change that, etc. I feel I spend more time and energy supervising AI while getting a sloppier end result.

hansvm 7 days ago
Thankfully, yelling like an angry foreman is more effective on LLMs than people.
> Get your fucking act together, and stop with the bullshit comments, shipping unfinished code, and towering mess of abstractions. I've seen you code properly before. You're an expert for God's sake. One more mistake, and you're fired. Fix it, now!
- hollerith 7 days ago
  
  I wouldn't talk that way to an LLM for fear of its bleeding over into my interactions with people.
  Back when computer performance was increasing faster than it is now and was more important to the user experience, a friend upgraded to a faster computer and suddenly became more impatient with me. He seemed to have expected my response time to have drastically decreased just like his computer's did.
  
  1 reply →

exe34 7 days ago

98% tests sounds really great though, give that LLM a raise!