Comment by trjordan

15 hours ago

> Those are not code problems. They are evaluation problems.

> Code becomes precious when it is the only place knowledge lives.

Reading AI code all day is _agonizing_. Just, a horrible way to live, and it melts people's brains at the moment you need them to be the most capable.

Manual programming has this really productive and gratifying feedback loop, where you read the code, write the code, and fix it until it compiles/runs/does what you want. AI code not only does half that for you, but it makes the "click" at the end uninspiring because you're never sure if it's cheated a bit to get to that moment.

Trying to operate with AI-generated code as the only durable artifact of programming is a dead end for the industry. Charity points to (and correct discards) architecture diagrams/specs as an interesting space to work in. My suspicion is that it's closer to the thing that's hand-written: prompts, markdown plans, and other nudges. Focus on the thing that you, as a human, produce, and that's the basis for both the core loop of "did the AI follow my instructions" and it's higher-leverage when you go to code review.

By the time you get to the PR, you've probably typed enough to Claude that you can regenerate the code, but the current industry default is to just throw away all those sessions and ship the code. That's backwards!

30 comments

trjordan

philbo 15 hours ago

If a coworker dumped a 5k-line code review on you, you'd tell them to come back when it's broken down into smaller, reviewable chunks. Large dumps of code are basically unreviewable by humans, but it seems like a lot of people have forgotten about that when it comes to LLMs.

trjordan 15 hours ago
I think it's worse than that. At least if I dumped 5k LoC on somebody in 2021, you knew I spent the time to write it, so it's "fair" to ask you to read it. But I didn't write it in 2026, so you shouldn't read it.
I think it's less about "break it down" and more about "let's communicate at the same altitude."
I wrote a (bait-titled) post about it: https://tern.sh/blog/stop-reading-prs/
- fusslo 14 hours ago
  
  113 files +22913 −2423
  305 files +15075 −13110
  153 files +21934 −8698
  125 files +28120 −2398
  43 files +11188 −63
  118 files +21564 −647
  These are the largest (6 of 35) in the past 30 days. added: 190079 removed: 39696 in the last 6 months
  from one person.
  
  2 replies →
roncesvalles 13 hours ago
You aren't allowed to block PRs for being too large anymore. The objective that every engineer should be 2x/3x/5x more productive can only be achieved if you go totally lax on code reviews.
Because if all your SWEs produce 5x more code, it also means they have to review 5x more code. But LLMs don't really help with code reviews. Then it becomes a Metcalfian paradox unless you just rubberstamp PRs, which is what is expected of you.
- vanuatu 13 hours ago
  
  its pretty easy to point your terminal agent to your giant pr and ask it to break it up into small prs
  if youre being asked to rubberstamp prs thats a management skill issue
darth_aardvark 15 hours ago
Breaking up a giant PR can be a tedious, time-consuming hassle, and in the past I could sympathize in practice if someone had a giant PR they didn't have time to decompose once they got it working.
But it's also the exact sort of thing that LLMs are literally perfect for in my experience so there's really no excuse anymore. I've never seen Claude fail to turn a 5k PR into a well-decomposed Graphite stack.
- xmodem 13 hours ago
  
  Hell, I've hand-written a large PR as a single commit and then asked claude to break it down for me at least once. But I think the fact doing this task by hand is a tedious, time-consuming hassle is not because it inherently has to be but because the tooling for doing it has barely changed in the past 15 years.
zmmmmm 8 hours ago

Now you get not just the 5 LoC to review but a 5 page essay to read in the form an auto-generated review as well. Which makes the submitter even more indignant when you start nit picking things about how it's implemented.
win311fwg 15 hours ago
It is not so much forgetting as much as it is acceptance that when welcoming AI into a codebase, the code can no longer matter; that all that matters is that the properties of the system are validated. That isn't a change that comes free, so nobody should be expecting magic, it is a different set of tradeoffs. There is no such thing as a panacea.
- rienbdj 8 hours ago
  
  > all that matters is that the properties of the system are validated
  I don’t think this is possible in practice without leaning on the stability of the code base.
- ChrisLTD 10 hours ago
  
  How can the code no longer matter? It literally is the logic (not to mention performance, and reliability) of the software.
  
  1 reply →
hootz 15 hours ago
I think they expect you to also use an LLM to review, and I bet they are doing exactly that when asked to review someone else's code.
- latentsea 14 hours ago
  
  That gets you 90% the way there. So, it it only really works if you accept the cruft and the risks associated with that last 10%. Been doing this day in a day out for the last few months and no matter how much and how good we get the automated reviews, we still can't skip the manual ones.
- acedTrex 13 hours ago
  
  Theres really no diff between a rubber stamp and an llm review, they both do the same thing.
  
  1 reply →
cmrdporcupine 14 hours ago

> If a coworker dumped a 5k-line code review on you, you'd tell them to come back when it's broken down into smaller, reviewable chunks.
I would, and all my training at Google told me to do that. But what I found after I left that comfortable box was that somehow this kind of practice is acceptable in the industry at large and you're expected to just Deal With It(tm). 5k lines isn't even high by what I've seen.
Worse the "code review" tools that people have access to in GitHub make this absolutely and totally unworkable to incrementally improve review. Messy merge commits full of "responding to code review" comments. Threads impossible to follow. Just bad tooling.
So a lot of shops, from what I've seen, are just yeeting it with very shallow reviews.
This is my observation pre agentic AI. LLMs just threw kerosene on that dumpster fire.

vjvjvjvjghv 10 hours ago

"Reading AI code all day is _agonizing_. Just, a horrible way to live, and it melts people's brains at the moment you need them to be the most capable."

I think it's very similar to dealing with large offshore teams. Every day you get a huge pile of code to review. It's really exhausting.

I prefer dealing with AI because at least it tends to follow rules once I write them down. Not so much with a lot of offshore guys. Same mistakes every day.

I guess my company needs to hire better offshore devs....

gavinh 13 hours ago

I agree that reading AI code all day is agonizing. We're relying on code review to develop parts of our mental model of the system that were previously developed through coding. We're having more difficulty comprehending and recall details of the system. This is probably unsurprising; people recall information better that they "generated" than information they read. I am applying some lessons from pedagogy to extend code review. If this resonates with you, I would like to talk.

mooreds 15 hours ago

Are there any products out there that are capturing the prompts/sessions? I imagine you could do it in an adhoc way, asking Claude to write up a summary of the session as part of the commit message. But is there anything else that's more structured/higher level?

sdesol 13 hours ago

I am working on solving the AI Code Provenance problem and I believe my repos may be the first that provides AI code provenance. See the following example:
https://github.com/gitsense/gsc-cli/blob/main/internal/cli/r...
Notice how the code block header attributes the model. The UUID can be traced to the conversation so everybody can tell exactly how the code came about. For this to work though, you need to use my chat app as it ensures you can't tamper with things if you are truly serious about AI code provenance.
I also have a lot more human-focused method which is part of my CLI tool.
https://github.com/gitsense/gsc-cli
I am currently looking at making pi (https://github.com/earendil-works/pi) support AI code provenance, but for now if you want a more structured way to capture what you have done in an agent session that can be used in code reviews and be carried forward as knowledge that lives inside your repository, I have
gsc lessons
The basic idea is, after you have finished chatting/working with the agent, you would work with it to identify lessons worth carrying forward. You can store your session if you want, but really, the lessons should be something that can help you review code better and to prevent future mistakes.
I have a real working example at
https://github.com/gitsense/smart-ripgrep
This is a fork of the BurntSushi/ripgrep repository. It shows how you can use lessons to learn from past design decisions.
trjordan 15 hours ago

We're working on it, thought it's all early. I'd love feedback: https://tern.sh
First product compares the code to the prompts and highlights places the agent made decisions you weren't involved in: https://tern.sh/docs/tours/
latentsea 14 hours ago

We just have hook that runs on git push that instructs Claude to ensure the PR description is up to date. Works well enough for us.
mplanchard 6 hours ago

So many. The sibling comments, plus GitAI, plus empathic, plus many others

deaton 11 hours ago

It almost seems like the juice might not be worth the squeeze. If you want verifiable code that conforms well to a well-designed plan, you have to basically write pseudocode and have the AI translate it for you. At that point why use the AI to write the code at all? And then, personally, I find that I just have more fun planning, writing, and debugging myself. I think its kinda the part of programming that I fell in love with in the first place.

pydry 10 hours ago

This is the core of the insanity nobody who vibe codes seems to be able to grasp.
It's not even that its more fun, AI can spew endless slop boilerplate but it simply can't handle boiling an application down to its component essence in a way that makes it straightforward to maintain and bug free.

keybored 13 hours ago

Flintstone Engineering is applying Space Age synthetic intelligence (in a metaphorical sense) technology with code generation. Babysitting, version controlling, etc. generated code should be a thing of the past. But that is what GenAI is.

At the very least apply it at a higher level: specification, proofs, anything but generating Rust/Java/C and then letting yourself or an agent babysit it.

agumonkey 13 hours ago

the act, eval, adjust loop is probably neurologically important.. reading about things you didn't dive into is really a dread

depending on your industry, you might be able to ship half-slop and then fix some bugs downstream though