I've only started using Claude, Gemini, etc in the last few months (I guess it comes with age, I'm no longer interested in trying the latest "tech"). I assume those are "non-agentic" models.
From reading articles online, "agentic" means like you have a "virtual" Virtual Assistant with "hands" that can google, open apps, etc, on their own.
Why not use existing "non-agentic" model and "orchestrate" them using LangChain, MCP etc? Why create a new breed of model?
I'm sorry if my questions sound silly. Following AI world is like following JavaScript world.
Reasonable question, simple answer: "New breed of model" is overstating it — all these models for years have been fine-tuned using reinforcement learning on a variety of tasks, it's just that the set of tasks (and maybe the amount of RL) has changed over time to include more tool use tasks, and this has made them much, much better at the latter. The explosion of tools like Claude Code this year is driven by the models just being more effective at it. The orchestration external to the model you mention is what people did before this year and it did not work as well.
It is not a silly question. The various flavors of LLM have issues with reliability. In software we expect five 9s, LLMs aren't even a one 9.
Early on it was reliability of them writing JSON output. Then instruction following. Then tool use. Now it's "computer use" and orchestration.
Creating models for this specific problem domain will have a better chance at reliability, which is not a solved problem.
Jules is the gemini coder that links to github. Half the time it doesn't create a pull request and forgets and assumes I'll do some testing or something. It's wild.
"Agentic" and "agent" can mean pretty much anything, there are a ton of different definitions out there.
When an LLM says it's "agentic" it usually means that it's been optimized for tool use. Pretty much all the big models (and most of the small ones) are designed for tool use these days, it's an incredibly valuable feature for a model to offer.
I don't think this new model is any more "agentic" than o3, o4-mini, Gemini 2.5 or Claude 4. All of those models are trained for tools, all of them are very competent at running tool calls in a loop to try to achieve a goal they have been given.
> I'm sorry if my questions sound silly. Following AI world is like following JavaScript world.
You are more right than you could possibly imagine.
TL;DR: "agentic" just means "can call tools it's been given access to, autonomously, and then access the output" combined with an infinite loop in which the model runs over and over (compared to a one-off interaction like you'd see in ChatGPT). MCP is essentially one of the methods to expose the tools to the model.
Is this something the models could do for a long while with a wrapper? Yup. "Agentic" is the current term for it, that's all. There's some hype around "agentic AI" that's unwarranted, but part of the reason for the hype is that models have become better at tool calling and using data in their context since the early days.
If the SWE Bench results are to be believed... this looks best in class right now for a local LLM. To be fair, show me the guy who is running this locally...
It's challenging, but not impossible. With 2-bit quantisation, only about 250-ish gigabytes of RAM is required. It doesn't have to be VRAM either, and you can mix and match GPU+CPU inference.
In addition, some people on /r/localLlama are having success with streaming the weights off SSD storage at 1 token/second, which is about the rate I get for DeepSeek R1.
At 1T MoE on 15.5T tokens, K2 is one of the largest open source models to date. But BAAI's TeleFM is 1T dense on 15.7T tokens:
https://huggingface.co/CofeAI/Tele-FLM-1T
That reminds me of a thought I had about the poachings.
The poaching was probably more aimed at hamstringing Meta's competition.
Because the disruption caused by them leaving in droves is probably more severe than the benefits of having them on board. Unless they are gods, of course.
Pelican on a bicycle result: https://simonwillison.net/2025/Jul/11/kimi-k2/
At this point, they have to be training it. At what point will you start using something else?
Once I get a picture that genuinely looks like a pelican riding a bicycle!
wow!
Big release - https://huggingface.co/moonshotai/Kimi-K2-Instruct model weights are 958.52 GB
Paired with programming tools like Claude Code, it could be a low-cost/open-source replacement for Sonnet
how do you low cost run a 1T param model?
9 replies →
According to the bench its closer to Opus, but I venture primarily for English and Chinese.
I've only started using Claude, Gemini, etc in the last few months (I guess it comes with age, I'm no longer interested in trying the latest "tech"). I assume those are "non-agentic" models.
From reading articles online, "agentic" means like you have a "virtual" Virtual Assistant with "hands" that can google, open apps, etc, on their own.
Why not use existing "non-agentic" model and "orchestrate" them using LangChain, MCP etc? Why create a new breed of model?
I'm sorry if my questions sound silly. Following AI world is like following JavaScript world.
Reasonable question, simple answer: "New breed of model" is overstating it — all these models for years have been fine-tuned using reinforcement learning on a variety of tasks, it's just that the set of tasks (and maybe the amount of RL) has changed over time to include more tool use tasks, and this has made them much, much better at the latter. The explosion of tools like Claude Code this year is driven by the models just being more effective at it. The orchestration external to the model you mention is what people did before this year and it did not work as well.
It is not a silly question. The various flavors of LLM have issues with reliability. In software we expect five 9s, LLMs aren't even a one 9. Early on it was reliability of them writing JSON output. Then instruction following. Then tool use. Now it's "computer use" and orchestration.
Creating models for this specific problem domain will have a better chance at reliability, which is not a solved problem.
Jules is the gemini coder that links to github. Half the time it doesn't create a pull request and forgets and assumes I'll do some testing or something. It's wild.
"Agentic" and "agent" can mean pretty much anything, there are a ton of different definitions out there.
When an LLM says it's "agentic" it usually means that it's been optimized for tool use. Pretty much all the big models (and most of the small ones) are designed for tool use these days, it's an incredibly valuable feature for a model to offer.
I don't think this new model is any more "agentic" than o3, o4-mini, Gemini 2.5 or Claude 4. All of those models are trained for tools, all of them are very competent at running tool calls in a loop to try to achieve a goal they have been given.
> I'm sorry if my questions sound silly. Following AI world is like following JavaScript world.
You are more right than you could possibly imagine.
TL;DR: "agentic" just means "can call tools it's been given access to, autonomously, and then access the output" combined with an infinite loop in which the model runs over and over (compared to a one-off interaction like you'd see in ChatGPT). MCP is essentially one of the methods to expose the tools to the model.
Is this something the models could do for a long while with a wrapper? Yup. "Agentic" is the current term for it, that's all. There's some hype around "agentic AI" that's unwarranted, but part of the reason for the hype is that models have become better at tool calling and using data in their context since the early days.
If the SWE Bench results are to be believed... this looks best in class right now for a local LLM. To be fair, show me the guy who is running this locally...
It's challenging, but not impossible. With 2-bit quantisation, only about 250-ish gigabytes of RAM is required. It doesn't have to be VRAM either, and you can mix and match GPU+CPU inference.
In addition, some people on /r/localLlama are having success with streaming the weights off SSD storage at 1 token/second, which is about the rate I get for DeepSeek R1.
Quite impressive benchmark, how come I don't see Kimi in Artificial analysis benchmarks?
This is both the largest oss model release thus far, and the largest Muon training run.
I really really want to try this model for free since I just don't have a gpu.
Is there any way that I could do so?
Open Router? Or does kimi have their own website? Just curious to really try it out!
Kimi.com
> 1T total / 32B active MoE model
Is this the largest open-weight model?
No.
At 1T MoE on 15.5T tokens, K2 is one of the largest open source models to date. But BAAI's TeleFM is 1T dense on 15.7T tokens: https://huggingface.co/CofeAI/Tele-FLM-1T
You can always check here: https://lifearchitect.ai/models-table/
I believe so.
Grok-1 is 341B, DeepSeek-v3 is 671B, and recent new open weights models are around 70B~300B.
How does it stack up against the new Grok 4 model?
Would be hilarious if Zuck with his billion dollar poaching failed to beat budget Chinese models.
That reminds me of a thought I had about the poachings.
The poaching was probably more aimed at hamstringing Meta's competition.
Because the disruption caused by them leaving in droves is probably more severe than the benefits of having them on board. Unless they are gods, of course.
Wikipedia listed a FAIR alumni as cofounder for this "Moonshot AI". Make it funnier probably.
[flagged]