Comment by karmasimida
5 days ago
Does local AI have a future? The models are getting ridiculously big and any storage hardware is hoarded by few companies for next 2 years and nvidia has stopped making consumer GPU for this year.
It seems to me there is no chance local ML is going to be anywhere out of the toy status comparing to closed source ones in short term
Mistral have small variants (3B, 8B, 14B, etc.), as do others like IBM Granite and Qwen. Then there are finetunes based on these models, depending on your workflow/requirements.
True, but anything remotely useful is 300B and above
That is a very broad and silly position to take, especially in this thread.
I use Devstral 2 and Gemini 3 daily.
1 reply →
I am actually doing now a good part of dev with Qwen3-Coder-Next on an M1 64GB with Qwen Code CLI (a fork of Gemini CLI). I very much like
Also I never have to wait in a queue, nor will I be told to wait for a few hours. And I get many answers in a second.
I don't do full vibe coding with a dozen agents though. I read all the code it produces and guide it where necessary.
Last not least, at some point the VC funded party will be over and when this happens one better knows how to be highly efficient in AI token use.
How much tokens per seconds are you getting ?
Whats the advantage of qwen code cli over opencode ?
320 tok/s PP and 42 tok/s TG with 4bit quant and MLX. Llama.cpp was half for this model but afaik has improved a few days ago, I haven't yet tested though.
I have tried many tools locally and was never really happy with any. I tried finally Qwen Code CLI assuming that it would run well with a Qwen model and it does. YMMV, I mostly do javascript and Python. Most important setting was to set the max context size, it then auto compacts before reaching it. I run with 65536 but may raise this a bit.
Last not least OpenCode is VC funded, at some point they will have to make money while Gemini CLI / Qwen CLI are not the primary products of the companies but definitely dog-fooded.
3 replies →