Comment by 999900000999

10 months ago

Very impressive, it doesn't need to be as good as the pay for token models. For example I've probably spent at least $300 last month on vibe coding, a big part of this is I want to know what tools I'm going to end up competing with, and another is I got a working implementation of one of my side projects, and then I decided I wanted it to be rewritten in another programming language.

Even if I chill out a bit here, a refurbished Nvidia laptop would pay for itself within a year. I am a bit disappointed Ollama can't handle the full flow yet, IE it could be a single command.

ollama code qwen3

4 comments

999900000999

_bin_ 10 months ago

I just tried it. It got stuck looping on a `cargo check` call and literally wouldn't do anything else. No additional context, just repeatedly spitting out the same tool call.

The problem is the best models barely clear the bar for some stuff in terms of coherence and reliability; anything else just isn't particularly usable.

999900000999 10 months ago
This happens when I'm using Claude Code too. Even the best models need humans to get unstuck.
Fron what I've seen most of them are good at writing new code from scratch.
Refactoring is very difficult.
- _bin_ 10 months ago
  
  I tried it 3-4 times before giving up and it did this every single time. I checked the tool call output and it was running cargo check appropriately. I think maybe the 30b-scale models just aren't sufficient for typical development.
  You're generally correct though, that from-scratch gets better results. This is a huge constraint of them: I don't want a model that will write something its way. I've already gone through my design and settled on the style/principles/libraries I did for a reason; the bot working terribly with that is a major flaw and I don't see saying "let the bot do things its preferred way" as a good answer. Some systems, things like latency matters, and the bot's way just isn't good enough.
  The vast majority of man-hours are maintaining and extending code, not green-fielding new stuff. Vendors should be hyper-focused on this, on compliance with user directions, not with building something that makes a react todo-list app marginally faster or better than competitors.
  
  1 reply →