Comment by selcuka

3 months ago

I use LMStudio for running models locally (macOS) and it tries to estimate whether the model would fit in my GPU memory (which is the same thing as main memory for Macs).

The Q4_K_S quantized version of Microsoft Fara 7B is a 5.8GB download. I'm pretty sure it would work on a 12GB Nvidia card. Even the Q8 one (9.5GB) could work.

3 comments

selcuka

BoredomIsFun 3 months ago

12GiB card not GB. Extra tail compounds to extra 800 MB.

selcuka 3 months ago
Fair, but the download sizes given above are also in GiB.
Also these calculations are very approximate anyway. The 6.67% difference will not change the fact that 5.8 << 12.
- BoredomIsFun 3 months ago
  
  No file sizes normally given in raw bytes. I've downloaded dozens of models from huggingface, and the difference was always favouring the VRAM size in GiB.