Comment by kofu

1 day ago

My experience also aligns with this. I'm running gemma4 31B on a 4090 through llm.cpp with unsloth models. I also run Qwen 3.6. Qwen is good for thinking and planning as it is faster, but Gemma4's generated code is much higher quality in the first try (Rust, C++ and C#). so it needs less revisions to be at a level I'm comfortable for merging.

I second unsloth models. I'm using them over blackwell-oriented nvfp4 models as they are (empirically) top quality and performance.

  • NVFP4 will be better if the model provider actually post-trained properly after quantizing.

    • Which basically only Nvidia does, because it’s very expensive.

      Though I’m currently working on QADing the smaller Qwen 3.5 models from FP16 teacher to NVFP4 student, to hopefully eventually apply it to 3.6 27B… harder to get right than I expected though!