← Back to context

Comment by abustamam

1 day ago

How does qwen compare to deepseek or kimi? I haven't spent much time with qwen but I find deepseek to be mostly comparable to opus for my pet projects. Kimi k2.6 did a lot of stupid stuff and talked to itself a lot "let me do X... Wait, X doesn't make sense because the user explicitly said Y"

Deepseek seems to seek first to understand before going off.

Deepseek is too large for me to self host on Spark. I was actually using Deepseek as my cloud backup and it performed well but then read the T&C which doesn't give as strong data protection guarantees unlike Google and Alibaba. Kimi is again massive and cloud hosted APIs are fairly expensive compared and it also has weak T&C, so have only benched but not tested. In general I found that with OpenClaw it works better to turn Reasoning off.

I think there's possibly value to try fine tuning Qwen 3.5 on my OpenClaw turns log to see if performance improves. The one recent model I haven't tested yet is Nemotron 3 Super which I might bench soon.