Comment by Octoth0rpe
15 hours ago
> That is, a feature can largely be written in one file, rather than bits and pieces all over the codebase.
This seems to be at odds with the goal of token minimization. Lots of small files that are narrowly scoped means less has to be loaded into context when making a change, right?
Throwing out another idea: I wonder if we could see some kind of equivalent of c header files for more modern languages so that an llm just has to read the equivalent of a .h file to start using a library.
> This seems to be at odds with the goal of token minimization. Lots of small files that are narrowly scoped means less has to be loaded into context when making a change, right?
my solution (as someone that's building something tangential) is to use granular levels of scope - there should be an implicit single file that gets generated from a package at a certain phase of the static tool processing. But the package is still split into files for flexibility and DevEx (developper experience). Files/Folder organization is super useful for humans. For tooling, the pacakge can be taken collected together, and taken as a single unit, but still decomposed based on things like namespace, and top-level definitions that define things like classes, specifications, etc. That way the tooling has control over how much context to pass in.
I think AST aware code reading is criminally underused by agents - you don't need a header file if you can see a listing of all the functions in a library.
Similarly, I don't read the whole file a function is in while editing it in an IDE, why should a coding agent get the whole file polluting its context by default?
Check out Ataraxy-Labs/weave for AST-aware git merges.
But, I wonder, do AST-aware tools cleave to the LLM training manifold the way coding-tutorial slop does?
Why would you need "header files" when a LSP server can give you just the outline of some file?