← Back to context

Comment by irthomasthomas

2 hours ago

I have a version of this called llm-consortium which I originally vibe-coded from a karpathy tweet[0].

  "I find that recently I end up using all of the models and all the time... for a lot of problems they have this 'NP Complete' nature to them, where coming up with a solution is significantly harder than verifying a candidate solution. So your best performance will come from just asking all the models, and then getting them to come to a consensus."

I realized at some point that 'consortium' was not proper term for what this was doing, since I was creating a kind of llm organization/council, whereas a consortium is a group of organizations. So rather than rename it I added the ability to create a consortium of consortiums, where each member can itself be a consortium models. The arbiter can also be a consortium which enables multi-model judging. This can obviously baloon token usage insanely, I think my record is over 100 models prompted from one prompt.

So to reign in the token explosion somewhat I added a simple rank mode, which produces only a ranking, and then the top ranked answer is returned. You can use this in combination with meta-consortiums like this

  >llm consortium save cns-kimi -m k2.7-code -n 5 --arbiter mercury-2 --judging-method rank
  llm consortium save cns-glm -m glm-5.2 -n 5 --arbiter mercury-2 --judging-method rank
  llm consortium save cns-meta-glm-kimi -m cns-glm -m cns-kimi --max-iterations 1 --arbiter qwen-3.5 # judging-method left at default to create a synthesis

This will first send five prompts each to kimi and glm and pick top ranked answer from each using the fast mercury-2 model, then it will create a synthesis from those two responses using a better model like qwen Mercury-2 is extremely fast, and good for ranking mode, but for synthesis I prefer a slightly larger model. This is most important when you are using it inside a harness or agent with a strict output format. This is because then you end up nesting a complex structure embedded in another complex structure (llm-consortium uses structured reasoning with xml tags). Even opus sometimes struggles with this in the few times I tried it - but qwen, glm and kimi have all been reliable arbiters so far.

If you combine it with the llm-model-gateway plugin you can serve a consortium like a regular model on an openai proxy and the response will be the synthesis, and conversation context is preserved for multi-turn chats.

[0] https://x.com/karpathy/status/1870692546969735361 Further reading: Mixture-of-agents https://www.together.ai/blog/together-moa Google's Mind-Evolution https://arxiv.org/html/2501.09891v1