Comment by andai

2 months ago

>Meanwhile GGUF Q2 and Q3 quantizations on llama.cpp keep getting better

Can you tell me more about this? It's been about a year since I looked into it, but it looked like performance dropped hard below Q4. I'd love to see more about this.

Also what's a good way to run them? I mostly use Ollama which only goes down to Q4. I think it supports HF urls though?

1 comment

andai

password4321 2 months ago

This recent discussion is still open and may provide some helpful info:

How to run Qwen 3.5 locally https://news.ycombinator.com/item?id=47292522