Comment by keerthiko

13 days ago

Almost always, notes like these are going to be about greenfield projects.

Trying to incorporate it in existing codebases (esp when the end user is a support interaction or more away) is still folly, except for closely reviewed and/or non-business-logic modifications.

That said, it is quite impressive to set up a simple architecture, or just list the filenames, and tell some agents to go crazy to implement what you want the application to do. But once it crosses a certain complexity, I find you need to prompt closer and closer to the weeds to see real results. I imagine a non-technical prompter cannot proceed past a certain prototype fidelity threshold, let alone make meaningful contributions to a mature codebase via LLM without a human engineer to guide and review.

12 comments

keerthiko

reubenmorais 13 days ago

I'm using it on a large set of existing codebases full of extremely ugly legacy code, weird build systems, tons of business logic and shipping directly to prod at neckbreaking growth over the last two years, and it's delivering the same type of value that Karpathy writes about.

jjfoooo4 12 days ago

That was true for me, but is no longer.

It's been especially helpful in explaining and understanding arcane bits of legacy code behavior my users ask about. I trigger Claude to examine the code and figure out how the feature works, then tell it to update the documentation accordingly.

chrisjj 12 days ago
> I trigger Claude to examine the code and figure out how the feature works, then tell it to update the documentation accordingly.
And how do you verify its output isn't total fabrication?
- jjfoooo4 12 days ago
  
  I read through it, scanning sections that seem uncontroversial and reading more closely sections that talk about things I'm less sure about. The output cites key lines of code, which are faster to track down and look at than trying to remember where in a large codebase to look.
  Inconsistencies also pop up in backtesting, for example if there's a point that the llm answers different ways in multiple iterations, that's a good candidate to improve docs on.
  Similar to a coworker's work, there's a certain amount of trust in the competency involved.
- _dark_matter_ 12 days ago
  
  Your docs are a contact. You can verify that contract using integration tests
  
  3 replies →

1123581321 13 days ago

These models do well changing brownfield applications that have tests because the constraints on a successful implementation are tight. Their solutions can be automatically augmented by research and documentation.

mh2266 12 days ago
I don't exactly disagree with this but I have seen models simply deleting the tests, or updating the tests to pass and declaring the failures were "unrelated to my changes", so it helpfully fixed them
- 1123581321 12 days ago
  
  I’ve had to deal with this a handful of times. You just have to make it restore the test, or keep trying to pass a suite of explicit red-green method tests it wrote earlier.
- hnben 12 days ago
  
  Yes. You have to treat the model like an eager yet incompetent worker, i.e. don't go full yolo mode and review everything they do.