Comment by zargon

6 hours ago

These calculators are almost entirely useless. They don't understand specific model architectures. Even the ones that try to support only specific models (like the apxml one) get it very wrong a lot of the time.

For example, the one you linked, when I provide a Qwen3.5 27B Q_4_M GGUF [0], says that it will require 338 GB of memory with 16-bit kv cache. That is wrong by over an order of magnitude.

[0] https://huggingface.co/bartowski/Qwen_Qwen3.5-27B-GGUF/resol...

2 comments

zargon

gdevenyi 4 hours ago

Mine does https://github.com/gdevenyi/huggingface-estimate

zargon 3 hours ago

Excellent job with this! I tried a few combinations that completely fail on other calculators and yours gets VRAM usage pretty much spot on, and even the performance estimate is in the ballpark to what I see with mixed VRAM / RAM workloads.
It's a shame that search is so polluted these days that it's impossible to find good tools like yours.