← Back to context

Comment by TZubiri

11 hours ago

have you tried Llama? In my experience it has been strictly better than GPT OSS, but it might depend on specifically how it is used.

Have you tried GPT-OSS-120b MXFP4 with reasoning effort set to high? Out of all models I can run within 96GB, it seems to consistently give better results. What exact llama model (+ quant I suppose) is it that you've had better results against, and what did you compare it against, the 120b or 20b variant?

  • How are you running this? I've had issues with Opencode formulating bad messages when the model runs on llama.cpp. Jinja threw a bunch of errors and GPT-OSS couldn't make tool calls. There's an issue for this on Opencode's repo but seems like it's been waiting or a couple of weeks.

    > What exact llama model (+ quant I suppose) is it that you've had better results against

    Not llama, but Qwen3-coder-next is on top of my list right now. Q8_K_XL. It's incredible (not just for coding).

    • Again, you're not specifying what GPT-OSS you're talking about, there are two versions, 20b and 120b. Not to mention if you have a consumer GPU, you're most likely running it with additional quantization too, but you're not saying what version.

      > Jinja threw a bunch of errors and GPT-OSS couldn't make tool calls.

      This was an issue for a week or two when GPT-OSS initially launched, as none of the inference engines had properly implemented support for it, especially around tool calling. I'm running GPT-OSS-120b MXFP4 with LM Studio and directly with llama.cpp, the recent versions handle it well and I have no errors.

      However, when I've tried either 120b or 20b with additional quantization (not the "native" MXFP4 ones), I've seen that they're having troubles with the tool syntax too.

      > Not llama

      What does your original comment mean then? You said llama was "strictly" better than GPT-OSS, which specific model variant are you talking about or you miswrote somehow?

      3 replies →