Comment by zozbot234

9 hours ago

Kimi is a natively quantized model, the lossless full precision release is 595GB. Your own link mentions that.

So, realistically, $100K for an 8x RTX 6000 Pro system that can run it at a usable rate.

  • I think people will always disagree on what qualifies as a "usable rate". But keep in mind that practically no one sensible is running the latest Opus or GPT around the clock, especially not at sustainable, unsubsidized prices. With open-weights models it's easy to do that.

    • Also for people doing something medical, privacy or sensitive data related, there's an almost incalculable value (depending on industry niche) in having absolutely no external network traffic to any servers/systems you don't fully control.

the 'unsloth' link above is a 3rd party person that has quantized it to Q8, the original release is considerably larger in size than 600GB:

https://huggingface.co/moonshotai/Kimi-K2.6

  • No.

    I have downloaded Kimi-K2.6 (the original release).

      du -sh moonshotai/Kimi-K2.6 
      555G moonshotai/Kimi-K2.6
    
      du -s moonshotai/Kimi-K2.6 
      581255612 moonshotai/Kimi-K2.6
    

    For comparison (sorted in decreasing sizes, 3 bigger models and 3 smaller models, all are recently launched):

      du -sh zai-org/GLM-5.1
      1.4T zai-org/GLM-5.1
      du -sh XiaomiMiMo/MiMo-V2.5-Pro 
      963G XiaomiMiMo/MiMo-V2.5-Pro
      du -sh deepseek-ai/DeepSeek-V4-Pro
      806G deepseek-ai/DeepSeek-V4-Pro
    
      du -sh XiaomiMiMo/MiMo-V2.5 
      295G XiaomiMiMo/MiMo-V2.5
      du -sh MiniMaxAI/MiniMax-M2.7
      215G MiniMaxAI/MiniMax-M2.7
      du -sh deepseek-ai/DeepSeek-V4-Flash
      149G deepseek-ai/DeepSeek-V4-Flash

  • That page mentions that the model is natively INT4 for most of the params, and 600GB is in the ballpark of what's available there for download.