Comment by irthomasthomas
1 day ago
You can spin up a version of this at home using simonw's LLM cli with the llm-consortium plugin.
Bonus 1: Use any combination of models. Mix n match models from any lab.
Bonus 2: Serve your custom consortium on a local API from a single command using the llm-model-gateway plugin and use it in your apps and coding assistants.
https://x.com/karpathy/status/1870692546969735361
> uv tool install llm
llm install llm-consortium
llm consortium save gthink-n5 -m gemini-pro -n 5 --arbiter gemini-flash --confidence-threshold 99 --max-iterations 4
llm serve --host 0.0.0.0
curl http://0.0.0.0:8000/v1/chat/completions \
-X POST \
-H "Content-Type: application/json" \
-d '{
"model": "gthink-n5",
"messages": [{"role": "user", "content": "find a polynomial algorithm for graph-isomorphism"}]
}'
You can also build a consortium of consortiums like so:
llm consortium save gem-squared -m gthink-n5 -n 2 --arbiter gem-flash
Or even make the arbiter a consortium:
llm consortium save gem-cubed -m gthink-n5 -n 2 --arbiter gthink-n5 --max-iteration 2
or go openweights only:
llm consortium save open-council -m qwen3:2 -m kimi-k2:2 -m glm-4.5:2 -m mistral:2 --arbiter minimax-m1 --min-iterations 2 --confidence-threshold 95
1. Why do you say this is a version of Gemini deep think? It seems like there could be multiple ways to build a multiagent model to explore a space. 2. The covariance between models leads to correlated errors, lowering the individual effectiveness of each contributing model. It would seem to me that you'd want to find a set of model architectures/prompt_congigs that minimizes covariance while maintaining individual accuracy, on a benchmark set of problems that have multiple provable solutions (i.e. not one path to a solution that is objectively correct).
I didn't mean to suggest it's a clone of Deep Think, which is proprietary. I meant that it's a version of parallel reasoning. Got the idea from Karpathy's tweet in December and built it. Then DeepMind published the "Evolving Deeper LLM Thinking" paper in January with similar concepts. Great minds, I guess? https://arxiv.org/html/2501.09891v1
2. The correlated errors thing is real, though I'd argue it's not always a dealbreaker. Sometimes you want similar models for consistency, sometimes you want diversity for coverage. The plugin lets you do either - mix Claude with kimi and Qwen if you want, or run 5 instances of the same model. The "right" approach probably depends on your use case.
Is the European Union a consortium of consortiums?
Thanks! Do you happen to know if there any OpenWebUI plugins similar to this?
You can use this with openwebui already. Just llm install llm-model-gateway. Then after you save a consortium you run llm serve --host 0.0.0.0 This will give you a openai compatible endpoint which you add to your chat client.
I am not seeing this llm serve command
it's a separate plugin rn. llm install llm-model-gateway