Comment by mottosso

6 hours ago

Specs for whatever they used to achieve the benchmarks would be a good start.

4 comments

mottosso

The benchmarks in the model card are purported to be measurements of model quality (ability to perform tasks with few errors), not speed.

They almost certainly run these benchmarks on their own cloud infrastructure (Alibaba afaik), which is typically not hardware that even the most enthusiastic homelab hobbyist can afford.

Aurornis 6 hours ago

The benchmarks are from the unquantized model they release.

This will only run on server hardware, some workstation GPUs, or some 128GB unified memory systems.

It’s a situation where if you have to ask, you can’t run the exact model they released. You have to wait for quantizations to smaller sizes, which come in a lot of varieties and have quality tradeoffs.

bityard 4 hours ago

This would likely run fine in just 96 GB of VRAM, by my estimation. Well within the ability of an enthusiastic hobbyist with a few thousand dollars of disposable income.
Quantizations are already out: https://huggingface.co/unsloth/Qwen3.6-27B-GGUF