Comment by smusamashah

1 month ago

The code base I work on at $dayjob$ is legacy, has few files with 20k lines each and a few more with around 10k lines each. It's hard to find things and connect dots in the code base. Dont think LLMs able to navigate and understand code bases of that size yet. But have seen lots of seemingly large projects shown here lately that involve thousands of files and millions of lines of code.

7 comments

smusamashah

jumploops 1 month ago

I’ve found that LLMs seem to work better on LLM-generated codebases.

Commercial codebases, especially private internal ones, are often messy. It seems this is mostly due to the iterative nature of development in response to customer demands.

As a product gets larger, and addresses a wider audience, there’s an ever increasing chance of divergence from the initial assumptions and the new requirements.

We call this tech debt.

Combine this with a revolving door of developers, and you start to see Conway’s law in action, where the system resembles the organization of the developers rather than the “pure” product spec.

With this in mind, I’ve found success in using LLMs to refactor existing codebases to better match the current requirements (i.e. splitting out helpers, modularizing, renaming, etc.).

Once the legacy codebase is “LLMified”, the coding agents seem to perform more predictably.

YMMV here, as it’s hard to do large refactors without tests for correctness.

(Note: I’ve dabbled with a test first refactor approach, but haven’t gone to the lengths to suggest it works, but I believe it could)

mh2266 1 month ago
are LLM codebases not messy?
Claude by default, unless I tell it not to, will write stuff like:
// we need something to be true somethingPasses = something() if (!somethingPasses) { return false } // we need somethingElse to be true somethingElsePasses = somethingElse() if (!somethingElsePasses) { return false } return true
instead of the very simple boolean logic that could express this in one line, with the "this code does what it obviously does" comments added all over the place.
generally unless you tell it not to, it does things in very verbose ways that most humans would never do, and since there's an infinite number of ways that it can invent absurd verbosity, it is hard to preemptively prompt against all of them.
to be clear, I am getting a huge amount of value out of it for executing a bunch of large refactors and "modernization" of a (really) big legacy codebase at scale and in parallel. but it's not outputting the sort of code that I see when someone prompts it "build a new feature ...", and a big part of my prompts is screaming at it not to do certain things or to refuse the task if it at any point becomes unsure.
- jumploops 1 month ago
  
  Yeah to be clear it will have the same issues as a flyby contributor if prompted to.
  Meaning if you ask it “handle this new condition” it will happily throw in a hacky conditional and get the job done.
  I’ve found the most success in having it reason about the current architecture (explicitly), and then to propose a set of changes to accomplish the task (2-5 ways), review, and then implement the changes that best suit the scope of the larger system.
  
  3 replies →
olig15 1 month ago

Surely because LLM generated code is part of the training data for the model, so code/patterns it can work with is closer to its training data.