Comment by magicalhippo
8 hours ago
Most people won't roll out their own K2 deployment across rented GPUs, so in that sense it doesn't matter that much, they'll be using a paid service which is just as much of a black box as Claude or ChatGPT. For example, on OpenRouter you can select a provider which state they use a given open model, but you have no idea what actually goes on behind the curtains, which quantization levels they use and so on.
That said, I do fully agree that it is valuable to have open near-frontier models, as a balance to the closed ones.
Well you can rent a capable node for a few hours for like $50, install Kimi yourself and verify occasionally whether it works just like in cloud providers.
> but you have no idea what actually goes on behind the curtains, which quantization levels they use and so on.
That would take something close to a global conspiracy of every technologist lying continuously to keep the tweaks secret. If necessary, I personally will rent some servers and run a vanilla Kimi K2.6 deployment for people to use at reasonable prices. I don't expect to ever make good on that threat because they are grim times indeed if I'm the first person doing something AI related, but the skill level required to load up a model behind an API is low.
So it isn't hard to see how there will be unadulterated Kimi models available and from there it is really, really straightfoward to tell if someone is quantising a model; just run some benchmarks against 2 different providers who both claim to serve the same thing. If one is quantising and another isn't there's a big difference in quality.
> If one is quantising and another isn't there's a big difference in quality.
Sure. But the problem is you have to do this continuously to have any measure of confidence, which is expensive. For example, a provider could at any point randomly start serving some fraction of the requests to a quantized model. Either due to "routing error", as Anthropic called one of their model degradation events, or trying to improve bottom line.
There's really no good way to detect this on a few-prompt level without overspending significantly, because they're all black boxes.
It's not really a black box. Useful models becoming fungible is crucial for disincentivizing bad behaviour with model providers. I can't really overstate how different it is from relying on closed models. If you don't like or trust any of the providers on OpenRouter you can rent the GPUs yourself and host it, although this is probably unnecessary.