Comment by zeroxfe

1 month ago

I've been using this model (as a coding agent) for the past few days, and it's the first time I've felt that an open source model really competes with the big labs. So far it's been able to handle most things I've thrown at it. I'm almost hesitant to say that this is as good as Opus.

65 comments

zeroxfe

rubslopes 1 month ago

Also my experience. I've been going back and forth between Opus and Kimi for the last few days, and, at least for my CRUD webapps, I would say they are both on the same level.

armcat 1 month ago

Out of curiosity, what kind of specs do you have (GPU / RAM)? I saw the requirements and it's a beyond my budget so I am "stuck" with smaller Qwen coders.

zeroxfe 1 month ago
I'm not running it locally (it's gigantic!) I'm using the API at https://platform.moonshot.ai
- BeetleB 1 month ago
  
  Just curious - how does it compare to GLM 4.7? Ever since they gave the $28/year deal, I've been using it for personal projects and am very happy with it (via opencode).
  https://z.ai/subscribe
  
  9 replies →
- HarHarVeryFunny 1 month ago
  
  It is possible to run locally though ... I saw a video of someone running one of the heavily quantized versions on a Mac Studio, and performing pretty well in terms of speed.
  I'm guessing a 256GB Mac Studio, costing $5-6K, but that wouldn't be an outrageous amount to spend for a professional tool if the model capability justified it.
  
  2 replies →
- jgalt212 1 month ago
  
  What's the point of using an open source model if you're not self-hosting?
  
  5 replies →
- rc1 1 month ago
  
  How long until this can be run on consumer grade hardware or a domestic electricity supply I wonder.
  Anyone have a projection?
  
  7 replies →
Carrok 1 month ago

Not OP but OpenCode and DeepInfra seems like an easy way.
observationist 1 month ago

API costs on these big models over private hosts tend to be a lot less than API calls to the big 4 American platforms. You definitely get more bang for your buck.
kristianp 1 month ago
Note that Kimi K2x is natively 4 bit int, which reduces the memory requirements somewhat.
- kristianp 24 days ago
  
  Here's the citation for that, I think its not in the Technical Report. https://huggingface.co/moonshotai/Kimi-K2.5#4-native-int4-qu...
tgrowazay 1 month ago
Just pick up any >240GB VRAM GPU off your local BestBuy to run a quantized version.
> The full Kimi K2.5 model is 630GB and typically requires at least 4× H200 GPUs.
- CamperBob2 1 month ago
  
  You could run the full, unquantized model at high speed with 8 RTX 6000 Blackwell boards.
  I don't see a way to put together a decent system of that scale for less than $100K, given RAM and SSD prices. A system with 4x H200s would cost more like $200K.
  
  1 reply →

timwheeler 1 month ago

Did you use Kimi Code or some other harness? I used it with OpenCode and it was bumbling around through some tasks that Claude handles with ease.

zedutchgandalf 1 month ago
Are you on the latest version? They pushed an update yesterday that greatly improved Kimi K2.5’s performance. It’s also free for a week in OpenCode, sponsored by their inference provider
- ekabod 1 month ago
  
  But it may be a quantized model for the free version.

thesurlydev 1 month ago

Can you share how you're running it?

eknkc 1 month ago
I've been using it with opencode. You can either use your kimi code subscription (flat fee), moonshot.ai api key (per token) or openrouter to access it. OpenCode works beautifully with the model.
Edit: as a side note, I only installed opencode to try this model and I gotta say it is pretty good. Did not think it'd be as good as claude code but its just fine. Been using it with codex too.
- Imustaskforhelp 1 month ago
  
  I tried to use opencode for kimi k2.5 too but recently they changed their pricing from 200 tool requests/5 hour to token based pricing.
  I can only speak from the tool request based but for some reason anecdotally opencode took like 10 requests in like 3-4 minutes where Kimi cli took 2-3
  So I personally like/stick with the kimi cli for kimi coding. I haven't tested it out again with OpenAI with teh new token based pricing but I do think that opencode might add more token issue.
  Kimi Cli's pretty good too imo. You should check it out!
  https://github.com/MoonshotAI/kimi-cli
  
  1 reply →
zeroxfe 1 month ago
Running it via https://platform.moonshot.ai -- using OpenCode. They have super cheap monthly plans at kimi.com too, but I'm not using it because I already have codex and claude monthly plans.
- esafak 1 month ago
  
  Where? https://www.kimi.com/code starts at $19/month, which is same as the big boys.
- UncleOxidant 1 month ago
  
  so there's a free plan at moonshot.ai that gives you some number of tokens without paying?
JumpCrisscross 1 month ago
> Can you share how you're running it?
Not OP, but I've been running it through Kagi [1]. Their AI offering is probably the best-kept secret in the market.
[1] https://help.kagi.com/kagi/ai/assistant.html
- deaux 1 month ago
  
  Doesn't list Kimi 2.5 and seems to be chat-only, not API, correct?
  
  1 reply →
explorigin 1 month ago
https://unsloth.ai/docs/models/kimi-k2.5
Requirements are listed.
- KolmogorovComp 1 month ago
  
  To save everyone a click
  > The 1.8-bit (UD-TQ1_0) quant will run on a single 24GB GPU if you offload all MoE layers to system RAM (or a fast SSD). With ~256GB RAM, expect ~10 tokens/s. The full Kimi K2.5 model is 630GB and typically requires at least 4× H200 GPUs. If the model fits, you will get >40 tokens/s when using a B200. To run the model in near full precision, you can use the 4-bit or 5-bit quants. You can use any higher just to be safe. For strong performance, aim for >240GB of unified memory (or combined RAM+VRAM) to reach 10+ tokens/s. If you’re below that, it'll work but speed will drop (llama.cpp can still run via mmap/disk offload) and may fall from ~10 tokens/s to <2 token/s. We recommend UD-Q2_K_XL (375GB) as a good size/quality balance. Best rule of thumb: RAM+VRAM ≈ the quant size; otherwise it’ll still work, just slower due to offloading.
  
  4 replies →
indigodaddy 1 month ago

Been using K2.5 Thinking via Nano-GPT subscription and `nanocode run` and it's working quite nicely. No issues with Tool Calling so far.
gigatexal 1 month ago
Yeah I too am curious. Because Claude code is so good and the ecosystem so just it works that I’m Willing to pay them.
- Imustaskforhelp 1 month ago
  
  I tried kimi k2.5 and first I didn't really like it. I was critical of it but then I started liking it. Also, the model has kind of replaced how I use chatgpt too & I really love kimi 2.5 the most right now (although gemini models come close too)
  To be honest, I do feel like kimi k2.5 is the best open source model. It's not the best model itself right now tho but its really price performant and for many use cases might be nice depending on it.
  It might not be the completely SOTA that people say but it comes pretty close and its open source and I trust the open source part because I feel like other providers can also run it and just about a lot of other things too (also considering that iirc chatgpt recently slashed some old models)
  I really appreciate kimi for still open sourcing their complete SOTA and then releasing some research papers on top of them unlike Qwen which has closed source its complete SOTA.
  Thank you Kimi!
- epolanski 1 month ago
  
  You can plug another model in place of Anthropic ones in Claude Code.
  
  5 replies →