← Back to context

Comment by mottosso

6 hours ago

Specs for whatever they used to achieve the benchmarks would be a good start.

The benchmarks in the model card are purported to be measurements of model quality (ability to perform tasks with few errors), not speed.

They almost certainly run these benchmarks on their own cloud infrastructure (Alibaba afaik), which is typically not hardware that even the most enthusiastic homelab hobbyist can afford.

The benchmarks are from the unquantized model they release.

This will only run on server hardware, some workstation GPUs, or some 128GB unified memory systems.

It’s a situation where if you have to ask, you can’t run the exact model they released. You have to wait for quantizations to smaller sizes, which come in a lot of varieties and have quality tradeoffs.