Comment by sosodev
5 hours ago
The problem with this question is that it encompasses a huge spectrum of capabilities and expectations. If you can only run an 8B model and expect it to be good at vibe coding / one shotting things you're going to have a bad time.
If you're able to run a model on the scale of ~30B, you can find that with a reasonably scoped and well defined task they do very well. I've found both Gemma4-31B and Qwen3.6-27B to be the best in this range at the moment. You can swap in the MoE models for faster inference, but they are noticeably worse at most tasks. They can one-shot / vibe code tasks with small scope, but still do much better with guidance.
If you really want frontier-like capabilities, you'll probably need at least 128GB of memory and either huge compute or a lot of patience. Most people just don't have either the money or the patience to make these local models work.
The patience required for local model usage goes far beyond just waiting for tokens though. It takes a lot of effort to get things configured and working properly for your workflow and hardware.
My experience with smaller models, in this case specifically GPT 5.4 Mini, is that they cannot two-shot moving a 10-20 line code change to another file without modifying it and introducing bugs.
I did not expect perfect reliability, but I thought they could at least get it right on the second attempt once you point out the difference. No such luck, it confidently tells you that now the code is the same, with yet another subtle bug added in the difference.
I don't know what work one would need to do where these garbage-class models would be adequate. Maybe they can masquerade as competent for a few minutes, but in the end the results simply are not right. At best they are suitable for a smarter search or autocomplete, in my opinion.
I use Gemma 4 26B A4B on my Macbook (M4 Pro, 48 GB RAM) to study Rust (and ask other myriad questions). I don't trust it to do a good job in an IDE/harness to one-shot anything but the most trivial of changes. Still, it's fast and good enough that it could handle being a "co-pilot" on small to medium context tasks where you've got your hands on the wheel and your eyes on the road — and are driving under the speed limit. That's remarkable given where we were a couple of years ago.
I don't think I'd be using AI to code at all if this weren't the case. (I don't want to feel stunted or stuck just from losing my internet connection.)