Comment by AndyNemmity
17 hours ago
That's completely fair, I also don't have much faith in that anymore. Very often, the people who make those claims have the most basic implementation that barely is one.
I'm not sure if the problems you run into with using LLMs will be solved if you do it my way. My problems are solved doing it my way. If I heard more about your problems, I would have a specific answer to them.
These are the solutions to where I have run into issues.
For sure, but my solutions are not feed the error back into the LLM. My solutions are varied, but as the blog shows, they are move as much as possible into scripts, and deterministic solutions, and keep the LLM to the smallest possible scope.
The current state of things is extremely useful for a subset of things. That subset of things feels small to me. But it may be every thing a certain person wants to do exists in that subset of things.
It just depends. We're all doing radically different things, and trying very different things.
I certainly understand and appreciate your perspective.
That makes sense.
My basic problem is: "first-run" LLM agent output frequently does one or more of the following: fails to compile/run, fails existing test coverage, or fails manual verification. The first two steps have been pretty well automated by agents: inspect output, try to fix, re-run. IME this works really well for things like Python, less-well for things like certain Rust edge cases around lifetimes and such, or goroutine coordination, which require a different sort of reasoning than "typical" procedural programming.
But let's assume that the agents get even better at figuring out the deal with the more specialized languages/features and are able to iterate w/o interaction to fix things.
If the first-pass output still has issues, I still have concerns. They aren't "I'm not going to use these tools" concerns, because I also sometimes write bugs, and they can write the vast majority of code faster than I can.
But they are "I'm not gonna vibe-code my day job" concerns because the existence of trivially-catchable issues suggests that there's likely harder-to-catch issues that will need manual review to make sure (a) test coverage is sufficient, (b) the mental model being implemented is correct, (c) the outside world is interacted with correctly. And I still find bugs in these areas that I have to fix manually.
This all adds up to "these tools save me 20-30% of my time" (the first-draft coding) vs "these agents save me 90% of my time."
So I'm kinda at a plateau for a few months where it'll be hard to convince me to try new things to try to close that 20-30% -> 90% number.
I experience the same things. What I’ve found is there is no issue I can’t solve so it doesn’t repeat.
The real issue is I don’t know the issues ahead of time. So each experience is an iteration stopping things I didn’t know would happen.
Thankfully, I’m not trying to sell anyone anything. I don’t even want people to use what I use. I only want people to understand the why of what I do, and how it adds me value.
I think it’s important to understand this thing we use as best we can.
The personal value you can get, is entirely up to your tolerance for it.
I just enjoy the process
For new-ish projects it should give you some crazy speed up out of the box.
For large codebases (my own has 500k lines and my company has a few tens of millions) you need something better like RPI.
If nothing else just being able to understand code questions basically instantly should give you a large speed up, even without any fancy stuff.
Damn, it really is all just vibes eh? Everyone just vibes their way to coding these days, no proof AI is actually doing anything for you. It's basically just how someone feels now: that's reality.
In some sense, computers and digital things have now just become a part of reality, blending in by force.
I mean, it’s not vibes. I make real projects, and the failures of AI doing it force me to make fixes so that it only ever fails doing that thing once. Then it no longer fails to do that thing.
But the things I am doing might not be the things you are doing.
If you want proof, I intend to release a game to the App Store and steam soon. At that point you can judge if it built a thing adequately.
No offense intended, I don't even know you at all, but I see people claim things like you did so often these days that I begin to question reality. These claims always have some big disclaimer, as yours does. I still don't know a single personal acquaintance who has claimed even a 2x improvement on general coding efficiency, not even 1.5x in general efficiency. Some of my coworkers say AI is good for this or that, but I literally just waste my time and money when I use it, I've never gotten good results or even adequate results to continue trying. I feel like I am taking crazy pills sometimes with all of the hype!
I hope you're just one of the ones who figured it out early and all the hype isn't fake bullshit. I'd much rather be proven wrong than for humanity to have wasted all this time and resources.
3 replies →