Comment by daxfohl

14 days ago

I want it to generate better code but less of it, and be more proactive about getting human feedback before it starts going off the rails. This sounds like an inexorable push in the opposite direction.

I can see this approach being useful once the foundation is more robust, has better common sense, knows when to push back when requirements conflict or are underspecified. But with current models I can only see this approach as exacerbating the problem; coding agents solution is almost always "more code", not less. Makes for a nice demo, but I can't imagine this would build anything that wouldn't have huge operational problems and 10x-100x more code than necessary.

14 comments

daxfohl

mtalantikite 14 days ago

Agreed, I'm constantly coming back to a Claude tmux pane just to see it's decided to do something completely ridiculous. Just the other day I was having it add some test coverage stats to CI runs and when I came back it was basically trying to reinvent Istanbul in a bash script because the nyc tool wasn't installed in CI. I had to interrupt it and say "uh, just install nyc?". I was "Absolutely right!".

spondyl 14 days ago

> it was basically trying to reinvent Istanbul in a bash script because the nyc tool wasn't installed in CI
For the first part of this comment, I thought "trying to reinvent Istanbul in a bash script" was meant to be a funny way to say "It was generating a lot of code" (as in generating a city's worth of code)
xyzsparetimexyz 14 days ago

If only Rome could be built in a day..

sothatsit 14 days ago

They haven’t released this feature, so maybe they know the models aren’t good enough yet.

I also think it’s interesting to see Anthropic continue to experiment at the edge of what models are capable of, and having it in the harness will probably let them fine-tune for it. It may not work today, but it might work at the end of 2026.

daxfohl 14 days ago
True, though even then I kind of wonder what's the point. Once they build an AI that's as good as a human coder but 1000x faster, parallelization no longer buys you anything. Writing and deploying the code is no longer the bottleneck, so the extra coordination required for parallelism seems like extra cost and risk with no practical benefit.
- sothatsit 14 days ago
  
  Each agent having their own fresh context window for each task is probably alone a good way to improve quality. And then I can imagine agents reviewing each others work might work to improve quality as well, like how GPT-5 Pro improves upon GPT-5 Thinking.
  
  4 replies →
- nojs 14 days ago
  
  It’s more about context management, not speed
  
  2 replies →

lupire 14 days ago

All you have to do is set up an MCP that routes to a human on the backend, and you d got an AI that asks for human feedback.

Antigravity and others already ask for human feedback on their plans.