Comment by irthomasthomas
12 hours ago
I have had a better experience with my own use. I use it every day and it rarely fails to improve tasks. Perhaps the prompts and rubrics make a difference. And finding bugs is one of the better use cases because it is essentially a search problem. As long as models are non-deterministic and there is some diversity in training data, then an ensemble that iterates on the problem is more likely to cover the ground needed to find solve a problem.
Some tasks benefit from this approach more than others. There was a paper from google on a version they made which was very similar and achieved SOTA then on planning and pathfinding benchmarks.
edit:
Mind Evolution paper https://deepmind.google/research/publications/122391/
(That was a month after I published llm-consortium :) https://xcancel.com/karpathy/status/1870692546969735361
No comments yet
Contribute on Hacker News ↗