← Back to context

Comment by dajonker

8 hours ago

I don't really have the hardware to try it out, but I'm curious to see how Qwen3.5 stacks up against Gemma 4 in a comparison like this. Especially this model that was fine tuned to be good at tool calling that has more than 500k downloads as of this moment: https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-...

I'm just some guy on hackernews, but I actually did try this on my DGX Spark. I went back to Gemma 4 after a few rounds. My orchestration model kept having to send the Qwen model back to fix mistakes that Gemma wouldn't have made. I wound up with less working code per hour due to the mistakes.

Technically, I use OpenWebUI with Ollama, so I used the weights below, but it should be the same.

https://ollama.com/kwangsuklee/Qwen3.5-27B-Claude-4.6-Opus-R...

  • I'd be super interested to hear about your workflow with OpenWebUI. I haven't figured out how to use it for anything other than the basic chatbot UI. I haven't been able to hook anything else into it

    • What I said above was a bit confused. What I've actually done is connect OpenCode and OpenWebUI both to Ollama. I just use OpenWebUI to manage the models and for testing/etc. Once you have it working it's very nice. You can pull a new model just by typing the name and waiting while it downloads, etc.

      Connecting Ollama to OpenCode and OpenWebUI is relatively trivial. In OpenWebUI there's a nice GUI. In OpenCode You just edit the ~/.config/opencode/opecode.json to look something like this. The model names have to match the ones you seen in OpenWebUI, but the friendly "name" key can be whatever you need to be able to recognize it.

        {
          "$schema": "https://opencode.ai/config.json",
          "provider": {
         "ollama": {
           "npm": "@ai-sdk/openai-compatible",
           "name": "Ollama",
           "options": {
          "baseURL": "http://localhost:11434/v1"
           },
           "models": {
          "qwen3.5:122b": {
            "name": "Qwen 3.5 122b"
          },
          "qwen3-coder:30b": {
            "name": "Qwen 3 Coder"
          },
          "gemma4:26b": {
            "name": "Gemma 4"
          }
           }
         }
          }
        }

It's rather surprising that a solo dev can squeeze more performance out of a model with rather humble resources vs a frontier lab. I'm skeptical of claims that such a fine-tuned model is "better" -- maybe on certain benchmarks, but overall?

FYI the latest iteration of that finetune is here: https://huggingface.co/Jackrong/Qwopus3.5-27B-v3

  • > can squeeze more performance out of a model with rather humble resources vs a frontier lab.

    That's the idea behind distillation. They are finetuning it on traces produced by opus. This is poor man's distillation (and the least efficient) and it still works unreasonably well for what it costs.

  • I feel that's a little bit misleading.

    That link doesn't have much affiliation with Qwen or anyone who produces/trained the Qwen models. That doesn't mean it's not good or safe, but it seems quite subjective to suggest it's the latest latest or greatest Qwen iteration.

    I can see huggingface turning into the same poisoned watering-hole as NPM if people fall into the same habits of dropping links and context like that.

    • I'm not saying it's the latest Qwen iteration - that would be Qwen3.6.

      I'm saying it's the latest iteration of the finetuned model mentioned in the parent comment.

      I'm also not suggesting that it's "the latest and greatest" anything. In fact, I think it's rather clear that I'm suggesting the opposite? As in - how can a small fine tune produce better results than a frontier lab's work?

      1 reply →