Comment by girvo
5 days ago
I have an Asus GX10 that I run Qwen3.5 122B A10B on, and I use it for coding through the Pi coding agent (and my own); I have to put more work in to ensure that the model verifies what it does, but if you do so its quite capable.
It makes using my Claude Pro sub actually feasible: write a plan with it, pick it up with my local model and implement it, now I'm not running out of tokens haha.
Is it worth it from a unit economics POV? Probably not, but I bought this thing to learn how to deploy and serve models with vLLM and SGLang, and to learn how to fine tune and train models with the 128GB of memory it gets to work with. Adding up two 40GB vectors in CUDA was quite fun :)
I also use Z.ai's Lite plan for the moment for GLM-5.1 which is very capable in my experience.
I was using Alibaba's Lite Coding Plan... but they killed it entirely after two months haha, too cheap obviously. Or all the *claw users killed it.
GLM 5.1 is extremely good, and ridiculously cheap on their coding plan. Its far better than Sonnet, and a fifth of the cost at API rates. I don't know if the American providers can compete long-term; what good is it to be more innovative it only buys them a six month lead andthey can't build the data center capacity fast enough for demand? Chinese providers have a huge advantage in electrical grid capacity.
True but Z.ai also just silently raised the price, and the entire Chinese frontier set is having to make profit now... hence Alibaba killing the Lite plan and not letting people sign up to their Pro one either; and why MiniMax has their non-commercial license, etc. etc.
So I agree with you, its better than Sonnet but way cheaper. I do wonder how long that will last though
Z.ai does really well at the carwash question!
Thank you. I've been using ollama for a much more modest local inference system. I'll research some of the things you've mentioned.
[dead]