← Back to context

Comment by tptacek

1 day ago

There really is a category of these posts that are coming from some alternate dimension (or maybe we're in the alternate dimension and they're in the real one?) where this isn't one of the most important things ever to happen to software development. I'm a person who didn't even use autocomplete (I use LSPs almost entirely for cross-referencing --- oh wait that's another thing I'm apparently never going to need to do again because of LLMs), a sincere tooling skeptic. I do not understand how people expect to write convincingly that tools that reliably turn slapdash prose into median-grade idiomatic working code "provide little value".

> I do not understand how people expect to write convincingly that tools that reliably turn slapdash prose into median-grade idiomatic working code "provide little value".

Honestly, I'm curious why your experience is so different from mine. Approximately 50% of the time for me, LLMs hallucinate APIs, which is deeply frustrating and sometimes costs me more time than it would have taken to just look up the API. I still use them regularly, and the net value they've imparted has been overall greater than zero, but in general, my experience has been decidedly mixed.

It might be simply that my code tends to be in specialized areas in which the LLM has little training data. Still, I get regular frustrating API hallucinations even in areas you'd think would be perfect use cases, like writing Blender plugins, where the documentation is poor (so the LLM has a relatively higher advantage over reading the documentation) and examples are plentiful.

Edit: Specifically, the frustrating pattern is: (1) the LLM produces some code that contains hallucinated APIs; (2) in order to test (or even compile) that code, I need to write some extra supporting code to integrate it into my project; (3) I discover that the APIs were hallucinated because the code doesn't work; (4) now I not only have to rewrite the LLM's code, but I also have to rewrite all the supporting code I wrote, because it was based around a pattern that didn't work. Overall, this adds up to more time than if I had just written the code from scratch.

  • You're writing Rust, right? That's probably the answer.

    The sibling comment is right though: it matters hugely how you use the tools. There's a bunch of tricks that help and they're all kind of folkloric. And then you hear "vibe coding" stories of people who generate their whole app from a prompt, looking only at the outputs; I might generate almost my whole project from an LLM, but I'm reading every line of code it spits out and nitpicking it.

    "Hallucination" is a particularly uninteresting problem. Modern LLM coding environments are closed-loop ("agentic", barf). When an LLM "hallucinates" (ie: is wrong, like I am many times a day) about something, it figures it out pretty quick when it tries to build and run it!

    • I haven’t had much of a problem writing Rust code with Cursor but I’ve got dozens of crates docs, the Rust book, and Rustinomicon indexed in Cursor so whenever I have it touch a piece of code, I @-include all of the relevant docs. If a library has a separate docs site with tutorials and guides, I’ll usually index those too (like the cxx book for binding C++ code).

      I also monitor the output as it is generated because Rust Analyzer and/or cargo check have gotten much faster and I find out about hallucinations early on. At that point I cancel the generation and update the original message (not send it a new one) with an updated context, usually by @-ing another doc or web page or adding an explicit instruction to do or not to do something.

  • One of the frustrating things about talking about this is that the discussion often sounds like we're all talking about the same thing when we talk about "AI".

    We're not.

    Not only does it matter what language you code in, but the model you use and the context you give it also matter tremendously.

    I'm a huge fan of AI-assisted coding, it's probably writing 80-90% of my code at this point, but I've had all the same experiences that you have, and still do sometimes. There's a steep learning curve to leveraging AIs effectively, and I think a lot of programmers stop before they get far enough along on that curve to see the magic.

    For example, right now I'm coding with Cursor and I'm alternating between Claude 3.7 max, Gemini 2.5 pro max, and o3. They all have their strengths and weaknesses, and all cost for usage above the monthly subscription. I'm spending like $10 per day on these models at the moment. I could just use the models included with the subscription, but they tend to hallucinate more, or take odd steps around debugging, etc.

    I've also got a bunch of documents and rules setup for Cursor to guide it in terms of what kinds of context to include for the model. And on top of that, there are things I'm learning about what works best in terms of how to phrase my requests, what to emphasize or tell the model NOT to do, etc.

    Currently I usually start by laying out as much detail about the problem as I can, pointing to relevant files or little snippets of other code, linking to docs, etc, and asking it to devise a plan for accomplishing the task, but not to write any code. We'll go back and forth on the plan, then I'll have it implement test coverage if it makes sense, then run the tests and iterate on the implementation until they're green.

    It's not perfect, I have to stop it and backup often, sometimes I have to dig into docs and get more details that I can hand off to shape the implementation better, etc. I've cursed in frustration at whatever model I'm using more than once.

    But overall, it helps me write better code, faster. I never could have built what I've built over the last year without AI. Never.

    • > Currently I usually start by laying out as much detail about the problem as I can

      I know you are speaking from experience, and I know that I must be one of the people who hasn't gotten far enough along the curve to see the magic.

      But your description of how you do it does not encourage me.

      It sounds like the trade-off is that you spend more time describing the problem and iterating on the multiple waves of wrong or incomplete solutions, than on solving the problem directly.

      I can understand why many people would prefer that, or be more successful with that approach.

      But I don't understand what the magic is. Is there a scaling factor where once you learn to manage your AI team in the language that they understand best, they can generate more code than you could alone?

      My experience so far is net negative. Like the first couple weeks of a new junior hire. A few sparks of solid work, but mostly repeating or backing up, and trying not to be too annoyed at simpering and obvious falsehoods ("I'm deeply sorry, I'm really having trouble today! Thank you for your keen eye and corrections, here is the FINAL REVISED code, which has been tested and verified correct"). Umm, no it has not, you don't have that ability, and I can see that it will not even parse on this fifteenth iteration.

      By the way, I'm unfailingly polite to these things. I did nothing to elicit the simpering. I'm also confused by the fawning apologies. The LLM is not sorry, why pretend? If a human said those things to me, I'd take it as a sign that I was coming off as a jerk. :)

> tools that reliably turn slapdash prose into median-grade idiomatic working code

This may be the crux of it.

Turning slapdash prose into median-grade code is not a problem I can imagine needing to solve.

I think I'm better at describing code in code than I am in prose.

I Want to Believe. And I certainly don't want to be "that guy", but my honest assessment of LLMs for coding so far is that they are a frustrating Junior, who maybe I should help out because mentoring might be part of my job, but from whom I should not expect any near-term technical contribution.

  • It is most of the problem of delivering professional software.

    • Not in my experience.

      The only slapdash prose in the cycle is in the immediate output of a product development discussion.

      And that is inevitably too sparse to inform, without the full context of the team, company, and industry.

      12 replies →