Comment by nzeid

17 hours ago

A few days ago I switched again from Qwen3.6 to Gemma 4 - for personal use I've experienced better average performance with the 26B version of the latter than the 27B of the former.

For someone who's been running local models for a long while, these are very very exciting times.

7 comments

nzeid

girvo 12 hours ago

Oh that's fascinating. 3.6 27B is pretty damned good, but slow in wall-clock times on my DGX Spark-alike. It generates huge reams of thinking before it gets the (usually correct!) answer, so wall-clock time is rough for tasks even at ~20tk/s

I'm surprised the 26B-A4B is better? It should be faster too, interesting. I'm excited to try 31B with MTP, because MTP-2 is what makes 27B bearable on the GB10.

What are you using it for? Agent-based coding, or something else?

glenngillen 10 hours ago

I've been thinking about doing more of this too. What spec machine are you running? And are you using long-running autonomous agents or more of the IDE/co-pilot style of collaboration?

apexalpha 17 hours ago

I’ve been swapping between these too as well.

However I find qwen unbeatable for toolcallling. I think gemma wasnt trained on that at all.

sigmoid10 16 hours ago
Gemma certainly was trained for tool calling, but the implementation in llama.cpp has been troubled because Gemma uses a different chat template format. The processor from the transformers library works fine though.
- apexalpha 2 hours ago
  
  Oh I must've missed this.
  The AI space moves so fast! I'll check it out again.
nzeid 16 hours ago

I'm using llama.cpp with Gemma and tool calling is mission critical. It's perfectly fine on my end.
There are definitely differences in the eagerness to tool-call that you'll need to manage. And for all local models I've ever used, I've had to micromanage the tools provided by servers to eliminate any possibility that they reach for something wonky or confusing.
magicalhippo 12 hours ago

> However I find qwen unbeatable for toolcallling. I think gemma wasnt trained on that at all.
Gemma4 chat template seems to had multiple issues, at least with llama.cpp, not sure they're all fixed yet. It assumed simple types for parameters for example.