Comment by Tepix
9 hours ago
Huggingface Link: https://huggingface.co/moonshotai/Kimi-K2.5
1T parameters, 32b active parameters.
License: MIT with the following modification:
Our only modification part is that, if the Software (or any derivative works thereof) is used for any of your commercial products or services that have more than 100 million monthly active users, or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.
One. Trillion. Even on native int4 that’s… half a terabyte of vram?!
Technical awe at this marvel aside that cracks the 50th percentile of HLE, the snarky part of me says there’s only half the danger in giving something away nobody can run at home anyway…
The model absolutely can be run at home. There even is a big community around running large models locally: https://www.reddit.com/r/LocalLLaMA/
The cheapest way is to stream it from a fast SSD, but it will be quite slow (one token every few seconds).
The next step up is an old server with lots of RAM and many memory channels with maybe a GPU thrown in for faster prompt processing (low two digits tokens/second).
At the high end, there are servers with multiple GPUs with lots of VRAM or multiple chained Macs or Strix Halo mini PCs.
The key enabler here is that the models are MoE (Mixture of Experts), which means that only a small(ish) part of the model is required to compute the next token. In this case, there are 32B active parameters, which is about 16GB at 4 bit per parameter. This only leaves the question of how to get those 16GB to the processor as fast as possible.
Its often pointed out in the first sentence of a comment how a model can be run at home, then (maybe) towards the end of the comment it’s mentioned how it’s quantized.
Back when 4k movies needed expensive hardware, no one was saying they could play 4k on a home system, then later mentioning they actually scaled down the resolution to make it possible.
The degree of quality loss is not often characterized. Which makes sense because it’s not easy to fully quantify quality loss with a few simple benchmarks.
By the time it’s quantized to 4 bits, 2 bits or whatever, does anyone really have an idea of how much they’ve gained vs just running a model that is sized more appropriately for their hardware, but not lobotomized?
2 replies →
> The model absolutely can be run at home. There even is a big community around running large models locally
IMO 1tln parameters and 32bln active seems like a different scale to what most are talking about when they say localLLMs IMO. Totally agree there will be people messing with this, but the real value in localLLMs is that you can actually use them and get value from them with standard consumer hardware. I don't think that's really possible with this model.
7 replies →
How do you split the model between multiple GPUs?
1 reply →
>The model absolutely can be run at home.
There is a huge difference between "look I got it to answer the prompt: '1+1='"
and actually using it for anything of value.
I remember early on people bought Macs (or some marketing team was shoveling it), and proposing people could reasonably run the 70B+ models on it.
They were talking about 'look it gave an answer', not 'look this is useful'.
While it was a bit obvious that 'integrated GPU' is not Nvidia VRAM, we did have 1 mac laptop at work that validated this.
Its cool these models are out in the open, but its going to be a decade before people are running them at a useful level locally.
2 replies →
Which conveniently fits on one 8xH100 machine. With 100-200 GB left over for overhead, kv-cache, etc.
that's what intelligence takes. Most of intelligence is just compute
Hey have they open sourced all Kimi k2.5 (thinking,instruct,agent,agent swarm [beta])?
Because I feel like they mentioned that agent swarm is available their api and that made me feel as if it wasn't open (weights)*? Please let me know if all are open source or not?
I'm assuming the swarm part is all harness. Well I mean a harness and way of thinking that the weights have just been fine tuned to use.
> or more than 20 million US dollars (or equivalent in other currencies) in monthly revenue, you shall prominently display "Kimi K2.5" on the user interface of such product or service.
Why not just say "you shall pay us 1 million dollars"?
? They prefer the branding. The license just says you have to say it was them if you make > $250mm a year on the model.
Companies with $20M revenue will not normally have spare $1M available. They'd get more money by charging reasonable subscriptions than by using lawyers to chase sudden company-ending fees.
it's monthly :) $240M revenue companies will absolutely find a way to fork $1M if they need to. Kimi most likely sees the eyeballs of free advertising as more profitable in the grander scheme of things
I assume this allows them to sue for different amounts. And not discourage too many people from using it.