Comment by brandall10
10 months ago
That rule of thumb is only related to 8 bit quants at low context. The default for ollama is 4 bit, which puts it roughly about 14GB.
The vast majority of people run between 4-6 bit depending on system capability. The extra accuracy above 6 tends to not be worth it relative to the performance hit.
No comments yet
Contribute on Hacker News ↗