← Back to context

Comment by csbartus

2 days ago

It's a gut feeling.

We _know_ LLMs can't be _that_ good as they are promoted.

I've spent the last 6 months creating a production grade app from scratch with Claude where I wrote no single line of code. I've reviewed code and it was looking good, almost completely following my templates, workflows, skills.

Now I've started to make minor manual updates and I'm horrified. Claude has no idea why there were those templates and instructions in place. It followed them blindly without grasping their spirit. The end result is like a very junior dev copy-pasting answers from Stack Overflow into the codebase. No consistency, chaotic application of different conventions, duplicated code, ghost code (does nothing), and perhaps more as I'm digging in.

The pros: The code works, all tests pass (43% code / 57% tests, 1:1.3 ratio), the UI looks good with visible glitches

The cons: I'll have to rewrite most of the code on the long run, make it fit, easy to maintain.

The verdict: I wouldn't started this project alone. Claude get me through to v0.1.0 / MVP where I've focused solely on the product: technologies, architecture, functionality, and usability. Now it's easier to refactor all for v0.2.0 manually without Claude.

So this might be our gut feeling: we know it's something good, but not as good as the stakeholders might promote. We know it helps in some ways but it's a nightmare in other ways.

We are not anti-AI but rather pragmatic: Not that AI enthusiasts we are expected to be.

> No consistency, chaotic application of different conventions, duplicated code, ghost code (does nothing), and perhaps more as I'm digging in.

I didn’t understand this part. You said you reviewed the code and it was looking good, so how did the cruft creep in? Were you reviewing every diff, or taking an occasional sample?

  • Reviewing is a very different mindset than writing it yourself. You don't have all the context you would have built up had you done it, and it's much much more difficult to think through all cases. So I'm thinking: The individual changes all looked good in isolation, and they started borderline rubber-stamping the changes without stepping back to think about the larger context.

    Looking at the individual changes in isolation, it's harder to see it doesn't match other conventions, duplicates code, removes or disables paths without cleaning up, etc. I'll bet there's also some crazy spaghetti code in there, from helping a co-worker clean up their Ai-generated code that they didn't understand.