Comment by eeperson
8 days ago
Everybody produces bugs, but Claude is good a producing code that looks like it solves the problem but doesn't. Developers worth working with, grow out of this in a new project. Claude doesn't.
An example I have of this is when I asked Claude to copy a some functionality from a front-end application to a back-end application. It got all of the function signatures right but then hallucinated the contents of the functions. Part of this functionality included a look up map for some values. The new version had entirely hallucinated keys and values, but the values sounded correct if you didn't compare with the original. A human would have literally copied the original lookup map.
I asked claude to help me figure out some statistical calculation in Apple Numbers. It helpfully provided the results of the calculation. I ignored it and implemented it in the spreadsheet and got completely different (correct) results. Claude did help me figure out how to do it correctly though!
> Developers worth working with, grow out of this in a new project. Claude doesn't.
There is no way this is true. People make fewer bugs with time and guidance, but no human makes zero bugs. Also, bugs are not planned; it's always easy to in hindsight say "A human would have literally copied the original lookup map," but every bug has some sort of mistake that is made that is off the status quo. That's why it's a bug.
No, it's broadly true. Also, that's why we have code review and tests, so that it has to pass a couple of filters.
LLMs don't make mistakes like humans make mistakes.
If you're a SWE at my company, I can assume you have a baseline of skill and you tested the code yourself, so I'm trying to look for any edge cases or gaps or whatever that you might have missed. Do you have good enough tests to make both of us feel confident the code does what it appears to do?
With LLMs, I have to treat its code like it's a hostile adversary trying to sneak in subtle backdoors. I can't trust anything to be done honestly.
Sorry, perhaps I should have been clearer. They don't grow completely out of making bugs (although they do tend to make fewer over time), they grow out of making solutions that look right but don't actually solve the problem. This is because they understand the problem space better over time.