← Back to context

Comment by faeyanpiraat

7 days ago

Looking at it from far is simply making something large from a smaller input, so its kind of like nondeterministic decompression.

What fills the holes are best practices, what can ruin the result is wrong assumptions.

I dont see how full autonomy can work either without checkpoints along the way.

Totally agreed. Those assumptions often compound as well. So the AI makes one wrong decision early in the process and it affects N downstream assumptions. When they finally finish their process they've built the wrong thing. This happens with one process running. Even on latest Opus models I have to babysit and correct and redirect claude code constantly. There's zero chance that 5 claude codes running for hours without my input are going to build the thing I actually need.

And at the end of the day it's not the agents who are accountable for the code running in the production. It's the human engineers.

  • Actually it works the other way. With multiple agents they can often correct each others mistaken assumptions. Part of the value of this approach is precisely that you do get better results with fewer hallucinated assumptions.

    Still makes this change from Anthropic stupid.

    • The corrective agent has the exact same percentage chance at making the mistake. "Correcting" an assumption that was previously correct into an incorrect one.

      If a singular agent has a 1% chance of making an incorrect assumption, then 10 agents have that same 1% chance in aggregate.

      9 replies →

  • Take a look at the latest Codex on very-high. Claude’s astroturfed IMHO.

    • Can you explain more? I'm talking about LLM/agent behavior in a generalized sense, even though I used claude code as the example here.

      What is Codex doing differently to solve for this problem?