Comment by naasking

4 hours ago

> It's not uncommon to see a gemma vs qwen comparison, where qwen does a bit better, but spent 22 minutes on the task, while gemma aligned the buttons wrong, but only spent 4 minutes on the same prompt.

Yes, Gemma 4 is very promising for its strong performance and token efficiency, but it's unfortunate that it's sliding window attention has a fatal flaw that makes me seriously hesitate to rely on it. See the series of videos on this channel:

https://youtu.be/ONQcX9s6_co?si=Yt55_N4DcNLstnGS

On top of Qwen3.5/3.6's superior recall, it's attention mechanism dramatically reduces KV cache requirements, so you can fit longer sessions in the same VRAM (or more concurrent sessions if you have agents running), which is critical for local hosting.

At this point Qwen3.6 with thinking mode disabled seems like the best balance.

0 comments

naasking

No comments yet

Contribute on Hacker News ↗