Comment by stingraycharles

1 day ago

Yeah the general “discovery” is that using the same reasoning compute effort, but spreading them over multiple different agents generally leads to better results.

It solves the “longer thinking leads to worse results” problem by approaching multiple paths of thinking in parallel, but just not think as long.

> Yeah the general “discovery” is that using the same reasoning compute effort, but spreading them over multiple different agents generally leads to better results.

Isn’t the compute effort N times as expensive, where N is the number of agents? Unless you meant in terms of time (and even then, I guess it’d be the slowest of the N agents).

  • Not exactly N times, no. In a traditional transformer arch token 1 is cheaper to generate than token 1000 is cheaper than token 10k and so on. So having 10x 1000 tokens would be cheaper to run concurrently than 10.000 in one session.

    You also run into context issues and quality degradation the longer you go.

    (this is assuming gemini uses a traditional arch, and not something special regarding attention)

  • The idea is that instead of assigning 10,000 thinking tokens to one chain of thought, assigning 1,000 thinking tokens to 10 chains of thought and composing those independent outputs into a single output yields better results.

    The fact that it can be done in parallel is just a bonus.