Comment by selcuka
2 days ago
I use LMStudio for running models locally (macOS) and it tries to estimate whether the model would fit in my GPU memory (which is the same thing as main memory for Macs).
The Q4_K_S quantized version of Microsoft Fara 7B is a 5.8GB download. I'm pretty sure it would work on a 12GB Nvidia card. Even the Q8 one (9.5GB) could work.
12GiB card not GB. Extra tail compounds to extra 800 MB.
Fair, but the download sizes given above are also in GiB.
Also these calculations are very approximate anyway. The 6.67% difference will not change the fact that 5.8 << 12.
No file sizes normally given in raw bytes. I've downloaded dozens of models from huggingface, and the difference was always favouring the VRAM size in GiB.