Comment by operatingthetan

19 hours ago

I think we are inevitably heading to using the cheap Chinese models like Kimi, GLM, and Minimax for the bulk of engineering tasks. Within 3-6 months they will be at Opus 4.6 level.

19 comments

operatingthetan

robertkarl 19 hours ago

This was literally my task today, to try out Qwen 9B locally on my, albeit a bit memory-constrained at 18GB, macbook with pi or opencode. Before reading this update.

operatingthetan 19 hours ago
Minimax coding plan is $10 a month for roughly 3x the $20 Claude Pro CLI usage allowed. That would be good place to start. 200k context though.
- jorjon 18 hours ago
  
  MiniMax has its own issues. Server overloads, API errors, and failure to adhere to even the system prompt. It can happily work for hours and get no job done.
  
  1 reply →
someuser54541 18 hours ago
Please report back, would be very interested in your findings.
- sshine 18 hours ago
  
  I ran OpenCode + GLM-5.1 for three weeks during my vacation. It’s okay. It thinks a lot more to get to a similar result as Claude. So it’s slower. It’s congested during peak hours. It has quirks as the context gets close to full.
  But if you’re stuck with no better model, it’s better than local models and no models.
  I have to say, OpenCode’s OpenUI has taught me what modern TUIs can be like. Claude’s TUI feels more like it’s been grown than designed. I’m playing around with TUI widgets trying to recreate and improve that experience
  
  2 replies →
- robertkarl 17 hours ago
  
  For what it's worth: here's my experience in the first 10 minutes of using Qwen locally to write some code. https://github.com/robertkarl/local-qwen-first-10-minutes it includes some token generation numbers and steps to repro.
hank2000 19 hours ago
how was it? I'm doing this today
- robertkarl 18 hours ago
  
  I will report back... but I have to recommend this comment on a post about Qwen 3.6 https://news.ycombinator.com/item?id=47843466 by daemonologist
  it goes into detail about llama-server args; quants to try; and layer/kv cache splits. I plan to try the techniques there.

try-working 18 hours ago

Kimi K3 in July-September is the big one.

muyuu 16 hours ago

Kimi 2.6 works roughly like Opus 4.6, when it used to work. Depending on the task, a bit better or a bit worse. And it's MUCH cheaper.

toasty228 4 hours ago
From this morning: I had a single go file with like 100 loc, I asked it to add debug prints, it thought for 5+ minutes, generating ~1m output token and did not actually update my file.
- slopinthebag 17 minutes ago
  
  Which harness? Did you use OpenRouter?

maxnevermind 17 hours ago

Anthropic will kick and scream as those are often distilled from their latest models and is cutting into their margin. Though it is not like their hands are clean neither, it is just a different type of stealing, an approved one :-)

kzisme 17 hours ago

How challenging are these to setup locally and have them running?

operatingthetan 17 hours ago
Getting them running is easy (check out LMstudio or ask one for some recommendations). The real question is whether you have the hardware to make them run fast enough to be useful.
- kzisme 13 hours ago
  
  The min req is probably crazy I assume but I'll take a peek :)