Comment by tarruda

7 hours ago

I only tried a very early version of that when it was just a llama.cpp fork and Qwen was certainly better in my tests.

But I was not super impressed with deepseek 4 flash using it from the official API either, so it doesn't seem quantization fault. It is a good model, but nothing out of the ordinary in the few benchmarks I ran on it (with full awareness that benchmarks are biased).