← Back to context

Comment by walrus01

8 hours ago

People thinking to self-host Kimi K2.6 had better be prepared for how big it is.

Q8 K XL quantization for instance is around 600GB on disk. I would bet about 700GB of VRAM needed.

Quantizations lower than Q8 are probably worthless for quality.

Or 2.05TB on disk for the full precision GGUF.

https://huggingface.co/unsloth/Kimi-K2.6-GGUF

If you can afford the hardware to run Kimi K2.6 at any decent speed for more than 1 simultaneous user, you probably have a whole team of people on staff who are already very familiar with how to benchmark it vs Claude, GPT-5.5, etc.

While most people would not be able to run Kimi K2.6 fast enough for a chat, as a coding assistant the low speed matters much less, especially when many tasks can be batched to progress during a single pass over the weights.

If you run it on your own hardware, you can run it 24/7 without worrying about token price or reaching the subscription limits and it is likely that you can do more work, even on much slower hardware. Customizing an open-source harness can also provide a much greater efficiency than something like Claude Code.

For any serious application, you might be more limited by your ability to review the code, than by hardware speed.

  • DeepSeek V4 Pro is way more effective at batching multiple tasks together since the KV cache is so much lighter - a max of ~10GB at full 1M context, and in a linear proportion with context according to the DeepSeek V4 release paper. That's extremely impressive, it unlocks batching, agent swarms etc. even on severely memory-constrained platforms, especially at smaller max context.

Kimi is a natively quantized model, the lossless full precision release is 595GB. Your own link mentions that.

  • So, realistically, $100K for an 8x RTX 6000 Pro system that can run it at a usable rate.

    • I think people will always disagree on what qualifies as a "usable rate". But keep in mind that practically no one sensible is running the latest Opus or GPT around the clock, especially not at sustainable, unsubsidized prices. With open-weights models it's easy to do that.

      1 reply →

  • the 'unsloth' link above is a 3rd party person that has quantized it to Q8, the original release is considerably larger in size than 600GB:

    https://huggingface.co/moonshotai/Kimi-K2.6

    • No.

      I have downloaded Kimi-K2.6 (the original release).

        du -sh moonshotai/Kimi-K2.6 
        555G moonshotai/Kimi-K2.6
      
        du -s moonshotai/Kimi-K2.6 
        581255612 moonshotai/Kimi-K2.6
      

      For comparison (sorted in decreasing sizes, 3 bigger models and 3 smaller models, all are recently launched):

        du -sh zai-org/GLM-5.1
        1.4T zai-org/GLM-5.1
        du -sh XiaomiMiMo/MiMo-V2.5-Pro 
        963G XiaomiMiMo/MiMo-V2.5-Pro
        du -sh deepseek-ai/DeepSeek-V4-Pro
        806G deepseek-ai/DeepSeek-V4-Pro
      
        du -sh XiaomiMiMo/MiMo-V2.5 
        295G XiaomiMiMo/MiMo-V2.5
        du -sh MiniMaxAI/MiniMax-M2.7
        215G MiniMaxAI/MiniMax-M2.7
        du -sh deepseek-ai/DeepSeek-V4-Flash
        149G deepseek-ai/DeepSeek-V4-Flash

    • That page mentions that the model is natively INT4 for most of the params, and 600GB is in the ballpark of what's available there for download.