Comment by daxfohl

12 hours ago

I want it to generate better code but less of it, and be more proactive about getting human feedback before it starts going off the rails. This sounds like an inexorable push in the opposite direction.

I can see this approach being useful once the foundation is more robust, has better common sense, knows when to push back when requirements conflict or are underspecified. But with current models I can only see this approach as exacerbating the problem; coding agents solution is almost always "more code", not less. Makes for a nice demo, but I can't imagine this would build anything that wouldn't have huge operational problems and 10x-100x more code than necessary.

Agreed, I'm constantly coming back to a Claude tmux pane just to see it's decided to do something completely ridiculous. Just the other day I was having it add some test coverage stats to CI runs and when I came back it was basically trying to reinvent Istanbul in a bash script because the nyc tool wasn't installed in CI. I had to interrupt it and say "uh, just install nyc?". I was "Absolutely right!".

  • > it was basically trying to reinvent Istanbul in a bash script because the nyc tool wasn't installed in CI

    For the first part of this comment, I thought "trying to reinvent Istanbul in a bash script" was meant to be a funny way to say "It was generating a lot of code" (as in generating a city's worth of code)

They haven’t released this feature, so maybe they know the models aren’t good enough yet.

I also think it’s interesting to see Anthropic continue to experiment at the edge of what models are capable of, and having it in the harness will probably let them fine-tune for it. It may not work today, but it might work at the end of 2026.

  • True, though even then I kind of wonder what's the point. Once they build an AI that's as good as a human coder but 1000x faster, parallelization no longer buys you anything. Writing and deploying the code is no longer the bottleneck, so the extra coordination required for parallelism seems like extra cost and risk with no practical benefit.

    • Each agent having their own fresh context window for each task is probably alone a good way to improve quality. And then I can imagine agents reviewing each others work might work to improve quality as well, like how GPT-5 Pro improves upon GPT-5 Thinking.

      2 replies →

All you have to do is set up an MCP that routes to a human on the backend, and you d got an AI that asks for human feedback.

Antigravity and others already ask for human feedback on their plans.