Comment by DAFtwinTurbo

3 months ago

Hi hackernews,

I wrote a small blogpost on a little experiment I did last week-end. The goal was to see if I could get more tok/s performance from llama.cpp running the latest Qwen3.5 models. A 5.5x perf increase was achieved by 1. recompiling with optimization flags 2. using ik_llama.cpp!