Comment by TheRoque

6 days ago

Well, I'll try to do a sticky note here:

- they can't be aware of the latest changes in the frameworks I use, and so force me to use older features, sometimes less efficient

- they fail at doing clean DRY practices even though they are supposed to skim through the codebase much faster than me

- they bait me into inexisting apis, or hallucinate solutions or issues

- they cannot properly pick the context and the files to read in a mid-size app

- they suggest to download some random packages, sometimes low quality ones, or unmaintained ones

22 comments

TheRoque

simonw 6 days ago

"they can't be aware of the latest changes in the frameworks I use, and so force me to use older features, sometimes less efficient"

That's mostly solved by the most recent ones that can run searches. I've had great results from o4-mini for this, since it can search for the latest updates - example here: https://simonwillison.net/2025/Apr/21/ai-assisted-search/#la...

Or for a lot of libraries you can dump the ENTIRE latest version into the prompt - I do this a lot with the Google Gemini 2.5 models since those can handle up to 1m tokens of input.

"they fail at doing clean DRY practices" - tell them to DRY in your prompt.

"they bait me into inexisting apis, or hallucinate solutions or issues" - really not an issue if you're actually testing your code! I wrote about that one here: https://simonwillison.net/2025/Mar/2/hallucinations-in-code/ - and if you're using one of the systems that runs your code for you (as promoted in tptacek's post) it will spot and fix these without you even needing to intervene.

"they cannot properly pick the context and the files to read in a mid-size app" - try Claude Code. It has a whole mechanism dedicated to doing just that, I reverse-engineered it this morning: https://simonwillison.net/2025/Jun/2/claude-trace/

"they suggest to download some random packages, sometimes low quality ones, or unmaintained ones" - yes, they absolutely do that. You need to maintain editorial control over what dependencies you add.

TheRoque 6 days ago
Thanks for the links. You mentioned 2 models in your posts, how should I proceed ? I can't possibly pay 2 subscriptions.. do you have a question for the better one to use ?
- simonw 6 days ago
  
  If you're only going to pay one $20/month subscription I think OpenAI wins at the moment - their search tools are better and their voice chat interface is better too.
  I personally prefer the Claude models but they don't offer quite as rich a set of extra features.
  If you want to save money, consider getting API accounts with them and spending money that way. My combined API bill across OpenAI, Anthropic and Gemini rarely comes to more than about $10/month.
timr 6 days ago
> Or for a lot of libraries you can dump the ENTIRE latest version into the prompt - I do this a lot with the Google Gemini 2.5 models since those can handle up to 1m tokens of input.
See, as someone who is actually receptive to the argument you are making, sometimes you tip your hand and say things that I know are not true. I work with Gemini 2.5 a lot, and while yeah, it theoretically has a large context window, it falls over pretty fast once you get past 2-3 pages of real-world context.
> "they fail at doing clean DRY practices" - tell them to DRY in your prompt.
Likewise here. Simply telling a model to be concise has some effect, to be sure, but it's not a panacea. I tell the latest models do do all sorts of obvious things, only to have them turn around and ignore me completely.
In short, you're exaggerating. I'm not sure why.
- simonw 6 days ago
  
  I stand by both things I said. I've found that dumping large volumes of code I to the Gemini 2.5 models works extremely well. They also score very highly on the various needle in a haystack benchmarks.
  This wasn't true of the earlier Gemini large context models.
  And for DRY: sure, maybe it's not quite as easy as "do DRY". My longer answer is that these things are always a conversation: if it outputs code that you don't like, reply and tell it how to fix it.
  
  2 replies →
karn97 6 days ago

[dead]

travisjungroth 6 days ago

Those aren't tasks.

agotterer 6 days ago

> they can't be aware of the latest changes in the frameworks I use, and so force me to use older features, sometimes less efficient

This is where collaboration comes in play. If you solely rely on the LLM to “vibe code” everything, then you’re right, you get whatever it thinks is best at the time of generation. That could be wrong or outdated.

My workflow is to first provide clear requirements, generally one objective at a time. Sometimes I use an llm to format the requirements for the llm to generate code from. It then writes some code, and I review it. If I notice something is outdated I give it a link to the docs and tell it to update it using X. A few seconds later it’s made the change. I did this just yesterday when building out an integration with an api. Claude wrote the code using a batch endpoint because the steaming endpoint was just released and I don’t think it was aware of it. My role in this collaboration, is to be aware of what’s possible and how I want it to work (e.g.. being aware of the latest features and updates of the frameworks and libraries). Then it’s just about prompting and directing the llm until it works the way I want. When it’s really not working, then I jump in.

bdangubic 6 days ago

they can't be aware of the latest changes in the frameworks I use, and so force me to use older features, sometimes less efficient

of course they can, teach them / feed them latest changes or whatever you need (much like another developer unaware of the same thing)

they fail at doing clean DRY practices even though they are supposed to skim through the codebase much faster than me

tell them it is not DRY until they make it DRY. for some (several projects I’ve been involved with) DRY is generally anti-pattern when taken to extremes (abstraction gone awry etc…). instruct it what you expect and it and watch it deliver (much like you would another developer…)

they bait me into inexisting apis, or hallucinate solutions or issues

tell it when it hallucinates, it’ll correct itself

they cannot properly pick the context and the files to read in a mid-size app

provide it with context (you should always do this anyways)

they suggest to download some random packages, sometimes low quality ones, or unmaintained ones

tell it about it, it will correct itself

esjeon 6 days ago
Anecdotally, ChatGPT still struggles with its own API. It keeps juggling between different versions of its API and hallucinates API parameters, even when I force-feed official documents into the context (to be fair, the documentation is straight awful). Sometimes it totally refuses to change its basic assumptions, so I have to blow up the context just to make it use the up-to-date API correctly.
LLMs are stupid - nothing magic, nothing great. They’re just tools. The problem with the recent LLM craze is that people make too many obviously partially true statements.
- simonw 6 days ago
  
  That's because GPT-4o's training cut-off is Sep 30, 2023 (see https://platform.openai.com/docs/models/gpt-4o) and the OpenAI API has changed a LOT since then.
  Claude 4 has a training cut-off of March 2025, I tried something today about its own API and it gave me useful code.
apwell23 6 days ago
> tell it when it hallucinates, it’ll correct itself
no it doesn't. Are you serious?
- bdangubic 6 days ago
  
  just today 3 times and countless times before… you just gotta take some serious time to learn and understand it… or alternatively write snarky comments on the internet…
  
  3 replies →
karn97 6 days ago

[dead]

apwell23 6 days ago

> - they bait me into inexisting apis, or hallucinate solutions or issues

yes. this happens to me almost every time i use it. I feel like a crazy person reading all the AI hype.

motza 6 days ago

I have definitely noticed these as well. Have you ever tried prompting these issues away? I'm thinking this might be a good list to add to every coding prompt

bradfa 6 days ago