← Back to context

Comment by andai

3 days ago

>Meanwhile GGUF Q2 and Q3 quantizations on llama.cpp keep getting better

Can you tell me more about this? It's been about a year since I looked into it, but it looked like performance dropped hard below Q4. I'd love to see more about this.

Also what's a good way to run them? I mostly use Ollama which only goes down to Q4. I think it supports HF urls though?