Comment by Arch-TK
2 days ago
"actually good enough to meet the goals?"
There's "okay for now" and then there's "this is so crap that if we set our bar this low we'll be knee deep in tech debt in a month".
A lot of LLM output in the specific areas _I_ work in is firmly in that latter category and many times just doesn't work.
So I can tell you don’t use these tools, or at least much, because at the speed of development with them you’ll be knee deep in tech debt in a day, not a month, but as a corollary can have the same agentic coding tools undergo the equivalent of weeks of addressing tech debt the next day. Well, I think this applies to greenfield AI-first oriented projects that work this way from the get go and with few humans in the loop (human to human communication definitely becomes the rate limiting step). But I imagine that’s not the nature of your work.
Yes if I went hard on something greenfield I'm sure I'll be knee deep in tech debt in less than a day.
That being said, given the quality of code these things produce, I just don't see that ever stopping being the case. These things require a lot of supervision and at some point you are spending more time asking for revisions than just writing it yourself.
There's a world of difference between an MPV which, in the right domain, you can get done much faster now, and a finished product.
I think you missed the your parent post's phrase "in the specific areas _I_ work in" ... LLMs are a lot better at crud and boilerplate than novel hardware interfaces and a bunch of other domains.
But why would it take a month to generate significant tech debt in novel domains, it would accrue even faster then right? The main idea I wanted to get across is that iteration speed is much faster so what's "tech debt" in the first pass, can be addressed much faster in future passes, which will happen on the order of days rather than sprints in the older paradigm. Yes the first iterations will have a bunch of issues but if you keep your hands on the controller you can get things to a decent state quickly. I think one of the biggest gaps I see in devs using these tools is what they do after the first pass.
Also, even for novel domains, using tools like deep research and the ability of these tools to straight up search through the internet, including public repos during the planning phase (you should be planning first before implementing right? You're not just opening a window and asking in a few sentences for a vaguely defined final product I hope) is a huge level up.
If there are repos, papers, articles, etc of your novel domain out there, there's a path to a successful research -> plan -> implement -> iterate path out there imo, especially when you get better at giving the tools ways to evaluate their own results, rather than going back and forth yourself for hours telling them "no, this part is wrong, no now this part is wrong, etc etc"
I mean, there's also, "this looks fine but if I actually had written this code I would've naturally spent more time on it which would have led me to anticipate the future of this code just a little bit more and I will only feel that awkwardness when I come back to this code in two weeks, and then we'll do it all over again". It's a spectrum.
Right.
And greenfield code is some of the most enjoyable to write, yet apparently we should let robots do the thing we enjoy the most, and reserve the most miserable tasks for humans, since the robots appear to be unable to do this.
I have yet to see an LLM or coding agent that can be prompted with "Please fix subtle bugs" or "Please retire this technical debt as described in issue #6712."
If you're willing to purchase enough tokens, you can prompt and agent to loop and fuzz its way to "retire* this technical debt as described in issue #6712". But then you still need to review it and make sure it's not just doing a "debt-swap", like some kind of metaverse financial swindler. So you're spending money to avoid fixing tech debt, but adding in the need to review "someone else's code". And to take ownership of that code!
*(Of course, depending on the issue, it could be doing anything from surpressing logs so existing tests pass, to making useless-but-passing tests, to brute-forcing special cases, to possibly actually fixing something.)