← Back to context

Comment by sfifs

1 day ago

Qwen is definitely the model to beat as of Mid 2026. While I didn't benchmark with SWE as my use cases are OpenClaw [1]. I found both Qwen 3.6 35B A3B and more impressively Qwen 3.5 122B A10B starting to be competitive with closed flash models. The NVFP4 quant of the latter is what I'm running now on DGX.

[1] https://srinathh.medium.com/mid-size-local-models-are-now-co...

How does qwen compare to deepseek or kimi? I haven't spent much time with qwen but I find deepseek to be mostly comparable to opus for my pet projects. Kimi k2.6 did a lot of stupid stuff and talked to itself a lot "let me do X... Wait, X doesn't make sense because the user explicitly said Y"

Deepseek seems to seek first to understand before going off.

  • Deepseek is too large for me to self host on Spark. I was actually using Deepseek as my cloud backup and it performed well but then read the T&C which doesn't give as strong data protection guarantees unlike Google and Alibaba. Kimi is again massive and cloud hosted APIs are fairly expensive compared and it also has weak T&C, so have only benched but not tested. In general I found that with OpenClaw it works better to turn Reasoning off.

    I think there's possibly value to try fine tuning Qwen 3.5 on my OpenClaw turns log to see if performance improves. The one recent model I haven't tested yet is Nemotron 3 Super which I might bench soon.