← Back to context

Comment by LPisGood

2 months ago

And it’s slower to review because you didn’t do the hard part of understanding the code as it was being written.

You're holding it wrong.

Set the boundaries and guidelines before it starts working. Don't leave it space to do things you don't understand.

ie: enforce conventions, set specific and measurable/verifiable goals, define skeletons of the resulting solutions if you want/can.

To give an example. I do a lot of image similarity stuff and I wanted to test the Redis VectorSet stuff when it was still in beta and the PHP extension for redis (the fastest one, which is written in C and is a proper language extension not a runtime lib) didn't support the new commands. I cloned the repo, fired up claude code and pointed it to a local copy of the Redis VectorSet documentation I put in the directory root telling it I wanted it to update the extension to provide support for the new commands I would want/need to handle VectorSets. This was, idk, maybe a year ago. So not even Opus. It nailed it. But I chickened out about pushing that into a production environment, so I then told it to just write me a PHP run time client that mirrors the functionality of Predis (pure-php implementation of redis client) but does so via shell commands executed by php (lmao, I know).

Define the boundaries, give it guard rails, use design patterns and examples (where possible) that can be used as reference.

  • They aren't holding it wrong, it's a fundamental limitation of not writing the code yourself. You can make it easier to understand later when you review it, but you still need to put in that effort.

    • Work in smaller parts then. You should have a mental model of what the code is doing. If the LLM is generating too much you’re being too broad. Break the problem down. Solve smaller problems.

      All the old techniques and concepts still apply.

      1 reply →

  • You are correct but developers are not yet ready to face it. The argument you'll always get is the flawed premise that it's less effort to write it yourself (While the same people work in teams that have others writing code for them every day of the week).

  • So in my experience with Opus 4.6 evaluating it in an existing code base has gone like this.

    You say "Do this thing".

    - It does the thing (takes 15 min). Looks incredibly fast. I couldn't code that fast. It's inhuman. So far all the fantastical claims hold up.

    But still. You ask "Did you do the thing?"

    - it says oops I forgot to do that sub-thing. (+5m)

    - it fixes the sub-thing (+10m)

    You say is the change well integrated with the system?

    - It says not really, let me rehash this a bit. (+5m)

    - It irons out the wrinkles (+10m)

    You say does this follow best engineering practices, is it good code, something we can be proud of?

    - It says not really, here are some improvements. (+5m)

    - It implements the best practices (+15m)

    You say to look carefully at the change set and see if it can spot any potential bugs or issues.

    - It says oh, I've introduced a race condition at line 35 in file foo and an null correctness bug at line 180 of file bar. Fixing. (+15m)

    You ask if there's test coverage for these latest fixes?

    - It says "i forgor" and adds them. (+15m)

    Now the change set has shrunk a bit and is superficially looking good. Still, you must read the code line by line, and with an experienced eye will still find weird stuff happening in several of the functions, there's redundant operations, resources aren't always freed up. (60m)

    You ask why it's implemented in such a roundabout way and how it intends for the resources to be freed up?

    - It says "you're absolutely right" and rewrites the functions. (+15m)

    You ask if there's test coverage for these latest fixes?

    - It says "i forgor" and adds them. (+15m)

    Now the 15 minutes of amazingly fast AI code gen has ballooned into taking most of the afternoon.

    Telling Claude to be diligent, not write bugs, or to write high quality code flat out does not work. And even if such prompting can reduce the odds of omissions or lapses, you still always always always have to check the output. It can not find all the bugs and mistakes on its own. If there are bugs in its training data, you can assume there will be bugs in its output.

    (You can make it run through much of this Socratic checklist on its own, but this doesn't really save wall clock time, and doesn't remove the need for manual checking.)

The same as asking one of your JRs to do something except now it follows instructions a little bit better. Coding has never been about line generation and now you can POC something in a few hours instead of a few days / weeks to see if an idea is dumb.

  • LLMs can easily output overwhelming quantities of code. Junior devs couldn't really do that, not consistently.

    Scale/quantity matter.

    This industry is not mature enough for 1000x the bad code we have now. It was barely hanging on with 1x bad code.

    • Yeah. Due diligence is exponentially more important with something like Claude because it is so fast. Get lazy for a few hours and you've easily added 20K LOC worth of technical debt to your code base, and short of reverting the commits and starting over, it'll not be easy to get it to fix the problems after the fact.

      It's still pretty fast even considering all the coaxing needed, but holy crap will it rapidly deteriorate the quality of a code base if you just let it make changes as it pleases.

      It very much feels like how the most vexing enemy of The Flash is like just some random ass banana peel on the road. Raw speed isn't always an asset.

      3 replies →