Comment by irthomasthomas

11 hours ago

I have had a better experience with my own use. I use it every day and it rarely fails to improve tasks. Perhaps the prompts and rubrics make a difference. And finding bugs is one of the better use cases because it is essentially a search problem. As long as models are non-deterministic and there is some diversity in training data, then an ensemble that iterates on the problem is more likely to cover the ground needed to find solve a problem.

Some tasks benefit from this approach more than others. There was a paper from google on a version they made which was very similar and achieved SOTA then on planning and pathfinding benchmarks.

edit:

Mind Evolution paper https://deepmind.google/research/publications/122391/

(That was a month after I published llm-consortium :) https://xcancel.com/karpathy/status/1870692546969735361