Comment by _bin_

10 months ago

I tried it 3-4 times before giving up and it did this every single time. I checked the tool call output and it was running cargo check appropriately. I think maybe the 30b-scale models just aren't sufficient for typical development.

You're generally correct though, that from-scratch gets better results. This is a huge constraint of them: I don't want a model that will write something its way. I've already gone through my design and settled on the style/principles/libraries I did for a reason; the bot working terribly with that is a major flaw and I don't see saying "let the bot do things its preferred way" as a good answer. Some systems, things like latency matters, and the bot's way just isn't good enough.

The vast majority of man-hours are maintaining and extending code, not green-fielding new stuff. Vendors should be hyper-focused on this, on compliance with user directions, not with building something that makes a react todo-list app marginally faster or better than competitors.

1 comment

_bin_

999900000999 10 months ago

If anything, it's a good sign that these tools are no where close to replacing us.

I was trying to get postgres working with a project the other day, and Claude decided that it was going to just replace it with SQL lite when it couldn't get the build to work.

All I want is "I don't know how to do this." But now these tools would rather just do it wrong.

They also have a very very strong tendency to try and force unoptimized solutions. You'll have 3 classes that do the exact same thing with only minor variable differences. Something a human would do in one class.

For my latest project I'm strongly tempted to just suck it up and code the whole thing by hand.