Kimi K2.5 Technical Report [pdf]

4 hours ago (github.com)

I've been using this model (as a coding agent) for the past few days, and it's the first time I've felt that an open source model really competes with the big labs. So far it's been able to handle most things I've thrown at it. I'm almost hesitant to say that this is as good as Opus.

It's interesting to note that a model that can OpenAI is valued almost 400 times more than moonshotai, despite their models being surprisingly close.

  • Well to be the devil's advocate: One is a household name that holds most of the world's silicon wafers for ransom, and the other sounds like a crypto scam. Also estimating valuation of Chinese companies is sort of nonsense when they're all effectively state owned.

I've been quite satisfied lately with MiniMax M-2.1 in opencode.

How does Kimi 2.5 compare to it in real world scenarios?

  • A lot better in my experience. M2.1 to me feels between haiku and sonnet. K2.5 feels close to opus. That's based on my testing of removing some code and getting it to reimplement based on tests. Also the design/spec writing feels great. You can still test k2.5 for free in OpenCode today.

    • Well, Minimax was the equivalent of Sonnet in my testing. If Kimi approach Opus, that would be great.

I really like the agent swarm thing, is it possible to use that functionality with OpenCode or is that a Kimi CLI specific thing? Does the agent need to be aware of the capability?

  • It seems to work with OpenCode, but I can't tell exactly what's going on -- I was super impressed when OpenCode presented me with a UI to switch the view between different sub-agents. I don't know if OpenCode is aware of the capability, or the model is really good at telling the harness how to spawn sub-agents or execute parallel tool calls.

I wonder how K2.5 + OpenCode compares to Opus with CC. If it is close I would let go of my subscription, as probably a lot of people.

  • It is not opus. It is good, works really fast and suprisingly through about its decisions. However I've seen it hallucinate things.

    Just today I asked for a code review and it flagged a method that can be `static`. The problem is it was already static. That kind of stuff never happens with Opus 4.5 as far as I can tell.

    Also, in an opencode Plan mode (read only). It generated a plan and instead of presenting it and stopping, decided to implement it. Could not use the edit and write tools because the harness was in read only mode. But it had bash and started using bash to edit stuff. Wouldn't just fucking stop even though the error messages it received from opencode stated why. Its plan and the resulting code was ok so I let it go crazy though...

  • I've been using K2.5 with OpenCode to do code assessments/fixes and Opus 4.5 with CC to check the work, and so far so good. Very impressed with it so far, but I don't feel comfortable canceling my Claude subscription just yet. Haven't tried it on large feature implementations.