Comment by gopalv

7 hours ago

> Making models larger improves overall accuracy but doesn't reliably reduce incoherence on hard problems.

Coherence requires 2 opposing forces to hold coherence in one dimension and at least 3 of them in higher dimensions of quality.

My team wrote up a paper titled "If You Want Coherence, Orchestrate a Team of Rivals"[1] because we kept finding that upping the reasoning threshold resulted in less coherence - more experimentation before we hit a dead-end to turn around.

So we had a better result from using Haiku (we fail over to Sonnet) over Opus and using a higher reasoning model to decompose tasks rather than perform each one of them.

Once a plan is made, the cheaper models do better as they do not double-think their approaches - they fail or they succeed, they are not as tenacious as the higher cost models.

We can escalate to higher authority and get out of that mess faster if we fail hard and early.

The knowledge of how exactly failure happened seems to be less useful to the higher reasoning model over the action biased models.

Splitting up the tactical and strategic sides of the problem, seems to work similarly to how Generals don't hold guns in a war.

[1] - https://arxiv.org/abs/2601.14351

> Coherence requires 2 opposing forces

This seems very basic to any kind of information processing beyond straight shot predictable transforms.

Expansion and reduction of possibilities, branches, scope, etc.

Biological and artificial neural networks converging into multiple signals, that are reduced by competition between them.

Scientific theorizing, followed by experimental testing.

Evolutionary genetic recombination and mutation, winnowed back by resource competition.

Generation, reduction, repeat.

In a continually coordinated sense too. Many of our systems work best by encouraging simultaneous cooperation and competition.

Control systems command signal proportional to demand, vs. continually reverse-acting error feedback.

  • > This seems very basic

    Yes, this is not some sort of hard-fought wisdom.

    It should be common sense, but I still see a lot of experiments which measure the sound of one hand clapping.

    In some sense, it is a product of laziness to automate human supervision with more agents, but on the other hand I can't argue with the results.

    If you don't really want the experiments and data from the academic paper, we have a white paper which is completely obvious to anyone who's read High Output Management, Mythical Man Month and Philosophy of Software Design recently.

    Nothing in there is new, except the field it is applied to has no humans left.