Comment by danaw
16 hours ago
i have a strong suspicion that the most productive software teams that leverage llms to build quality software will use it for the following:
- intelligent autocomplete: the "OG" llm use for most developers where the generated code is just an extension of your active thought process. where you maintain the context of the code being worked on, rather than outsourcing your thinking to the llm
- brainstorming: llms can be excellent at taking a nebulous concept/idea/direction and expand on it in novel ways that can spark creativity
- troubleshooting: llms are quite good at debugging an issue like a package conflict, random exception, bug report, etc and help guide the developer to the root cause. llms can be very useful when you're stuck and you don't have a teammate one chair over to reach out to
- code review: our team has gotten a lot of value out of AI code review which tends to find at least a few things human reviewers miss. they're not a replacement for human code review but they're more akin to a smarter linting step
- POCs: llms can be good at generating a variety of approaches to a problem that can then be used as inspiration for a more thoughtfully built solution
these uses accelerate development while still putting the onus on the developers to know what they're building and why.
related, i feel it's likely teams that go "all in" on agentic coding are going to inadvertently sabotage their product and their teams in the long run.
> intelligent autocomplete
I'm curious how much value others are finding in this. Personally I turned it off about a year ago and went back to traditional (jetbrains) IDE autocomplete. In my experience the AI suggestions would predict exactly what I wanted < 1% of the time, were useful perhaps 10% of the time, and otherwise were simply wrong and annoying. Standard IDE features allowing me to quickly search and/or browse methods, variables, etc. are far more useful for translating my thoughts into code (i.e. minimizing typing).
Same, I use Claude but cannot stand typing and being constantly flashed with suggestions that aren't right and have to keep hitting escape to cancel them. It's either manual or full AI for me. This happens in a lot if web tools that have been enhanced with AI, like a few databases with web UIs that allow querying. They are so bad. I really wish they would just dump the whole schema into the context before I begin because I don't need fancy autocomplete, I need schema, table, and column autocomplete wayyy more than I need it to scaffold out a SELECT for me.
I have it on a long timer so that I have to pause for a while before the auto-complete prompt appears. I've found I tend to deliberately set things up for it to attempt when I know I'm going to have to type a bunch of boiler plate or some code that's logically straightforward but syntactically fiddly ie. I write a quick comment describing what the next few lines should do and then wait a seconds for it to make the suggestion
Even worse, I've seen the JetBrains AI auto-complete insert hard-to-spot bugs, like two nested for loops with i and j for loop index variables, where the inner loop was fairly complex and incorrectly used i instead of j in one place.
perhaps it depends on language or domain but for me it's usually a minimum of 50% but often 80% what in looking for (lots of web off like typescript, svelte, cloudflare workers, tailwind etc).
[dead]
I'm with you on all apart from code review.
Our team has tried a couple tools. Most of the issues highlighted are either very surface level or non-issues. When it reviews code from the less competent team members, it misses deeper issues which human review has caught, such as when the wrong change has been made to solve a problem which could be solved a better way.
Our manager uses it as evidence to affirm his bias that we don't know what we're doing. It got to the point that he was using a code review tool and pasting the emoji littered output into the PR comments. When we addressed some of the minor issues (extra whitespace for example) he'd post "code review round 2". Very demoralising and some members of the team ended up giving up on reviewing altogether and just approving PRs.
I think it's ok to review your own code but I don't think it should be an enforced constraint in a process, because the entire point of code review from the start was to invest time in helping one another improve. When that is outsourced to a machine, it breaks down the social contract within the team.
Indeed “it misses deeper issues […] such as when the wrong change has been made“ which human review will catch.
What it will do, is notice inconsistencies like a savant who can actually keep 12 layers of abstraction in mind at once. Tiny logic gaps with outsized impact, a typing mistake that will lead to data corruption downstream, a one variable change that complete changes your error handling semantics in a particular case, etc. It has been incredibly useful in my experience, it just serves a different purpose than a peer review.
yup - security reviews.
ouch, sounds like your manager is more a problem than the llm review!
i find it as a good backstop to catch dumb mistakes or suggest alternatives but is not a replacement for human review (we require human review but llm suggestions are always optional and you're free to ignore)
Formatting should be handled by deterministic tools with formally specified rules like prettier. This should never be a part if code review.
IME it's impossible to fight this people. They have to learn through consequences. There's no other way.
Don't give up on the automated code review entirely though, the models and prompts are getting better every day.
FWIW I was watching an interview with the founder of Claude Code and he claims that at Anthropic, no code is written by hand anymore.
https://www.youtube.com/watch?v=SlGRN8jh2RI&pp=0gcJCQMLAYcqI...
That explains the spaghetti ball that is CC
I'd add rapid mockups/prototyping as well. Not suitable for production use but very suitable for iterating until it looks right, and then you go and make it for real.
On troubleshooting, either LLMs used to be better, or I'm in a huge bad luck strake. All of the last few times I tried to ask one, I've got a perfectly believable and completely wrong answer that weren't even on the right subject.
On code review, the amount of false positives is absolutely overwhelming. And I see no reason for that to improve.
But yes, LLMs can probably help on those lines.
I've found them super hit or miss for debugging. I've gone down several rabbit holes where the LLM wasted hours of my time for a simple fix. On the other hand, they're awesome for ripping through thousands of log lines and then correlating it to something dumb happening in your codebase. My modus opernadi with them for debugging is basically "distrust but consider". I'll let one of them rip in the background while I go and debug myself, and if they can find the solution, great, if not, well, I haven't spent much effort or time trying to convince them to find the problem.
this can absolutely happen and i've experienced it myself recently. that said id say its still better than some of the alternatives and i've had probably 60-80% luck with it if properly prompted
what models have you been using that are the least helpful?
I usually use git and open source tooling, but I've been working with our internal tech stack recently. It includes an editor with AI-powered autocomplete, and it drives me crazy.
It populates suggestions nearly instantly, which is constantly distracting. They're often wrong (either not the comment I was leaving, or code that's not valid). Most of the normal navigation keys implicitly accept the suggestion, so I spend an annoying amount of time editing code I didn't write, and fighting with the tool to STFU and let me work. Sometimes I'll try what it suggests only to find out that it doesn't build or is broken in other stupid ways.
All of this with the constant anxiety to "be more productive because AI."
oof. nothing like a home grown tool that gets more in your way than helps!
i especially find suggestions distracting in markdown where i feel is the key place i really dont want an llm trying to interfere in my ability to communicate to other developers on my team
This is one of the most insightful comment I've read on the subject in a a while minus the code review.
All the described use cases are good enough for AI except code review which is hit or miss.
But agentic coding is a snake oil.
appreciate the compliment!
i don't see llm code review as any kind of code review replacement; more as a backstop to catch things a human might miss (like today an llm caught an unimplemented feature in a POC that would have otherwise been easy for a human to miss)
the most productive teams will be the ones that treat code as compiler output (which we never read)
legacy manual codebases which require human review will be the new "maintaining a FORTRAN mainframe". they'll stick around for longer than you'd expect (because they still work) , at legacy stagnant engineering companies
i disagree because i see code as the actual product of the thought behind it. it is after all a description of the intent of the programmer and programming language are what we use to communicate to machines
that said, we will see over the next few years who is right!
Even generating a first-pass of the eventual production code that you can step back and review is useful to get ideas, so long as you guard yourself against laziness of going with the first answer it provides
100%. even having them come up with a few very different competing solutions can be really valuable to explore the problem space
> related, i feel it's likely teams that go "all in" on agentic coding are going to inadvertently sabotage their product and their teams in the long run.
They are trying to get warm by pissing their pants.
lol it does have that vibe
people have been making some version of this comment for the past three years, and the only thing that has changes is that you keep adding capabilities.
2 years ago people were saying it was purely autocomplete and enhanced google.
AI bears just continue to eat shit year after year and keep pretending they didnt say that AI would never be capable of what its currently capable of.
i'll bite. the uses for llms i've described are about what i've been using them for since chatgpt 3o. they've absolutely gotten better since then but i still find them to be very poor replacements for humans, esp in regards to architectural direction. they're very useful assistants tho