Comment by Aurornis

10 hours ago

The benchmarks are from the unquantized model they release.

This will only run on server hardware, some workstation GPUs, or some 128GB unified memory systems.

It’s a situation where if you have to ask, you can’t run the exact model they released. You have to wait for quantizations to smaller sizes, which come in a lot of varieties and have quality tradeoffs.