Comment by danielhanchen

2 months ago

Oh https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks might be helpful - it provides benchmarks for Q4_K_XL vs Q4_K_M etc for disk space vs KL Divergence (proxy for how close to the original full precision model)

Q4_0 and Q4_1 were supposed to provide faster inference, but tests showed it reduced accuracy by quite a bit, so they are deprecated now.

Q4_K_M and UD-Q4_K_XL are the same, just _XL is slightly bigger than _M

The naming convention is _XL > _L > _M > _S > _XS

3 comments

danielhanchen

sowbug 2 months ago

Thanks for all your contributions.

Do you think it's time for version numbers in filenames? Or at least a sha256sum of the merged files when they're big enough to require splitting?

Even with gigabit fiber, it still takes a long time to download model files, and I usually merge split files and toss the parts when I'm done. So by the time I have a full model, I've often lost track of exactly when I downloaded it, so I can't tell whether I have the latest. For non-split models, I can compare the sha256sum on HF, but not for split ones I've already merged. That's why I think we could use version numbers.

danielhanchen 2 months ago
Thanks! Oh we do split if over 50GB - do you mean also split on 50GB shards? HuggingFace XET has an interesting feature where each file is divided into blocks, so it'll do a SHA256 on each block, and only update blocks
- sowbug 2 months ago
  
  That might be the answer -- something like BitTorrent that updates only the parts that need updating.
  But I do think I'm identifying an unmet need. Qwen3.5-122B-A10B-BF16.gguf, for example: what's its sha256sum? I don't think the HF UI will tell you. I can only download the shards, verify each shard's sha256sum (which the HF UI does provide), llama-gguf-split --merge them, and then sha256sum the merged file myself. But I can't independently confirm that final sha256sum from any other source I trust.