Comment by Tepix

15 hours ago

Kimi K2.6 does not run well on 256GB.

Have you tried it? It would be slow for sure, but the main limitation AIUI would actually be storing the context in RAM - models like Kimi and GLM have high demands there which limit your ability to get meaningful aggregate throughput via large batches.

True, I might be thinking of some of the communities four-Spark clusters for it; it’s already int4 right?