Comment by svachalek

15 hours ago

Yeah it seems based on 2023 research which is ancient, back when we didn't have coding agents at all, and on some 1980s sci fi concepts of "how machines think" (beedeeboop) rather than the all too human coding agents we have.

If I had to design one of these, I'd go for:

1. Token minimization (which may be circular, I'm sure tokens are selected for these models at least in part based on syntax of popular languages)

2. As many compile time checks as possible (good for humans, even better for machines with limited context)

3. Maximum locality. That is, a feature can largely be written in one file, rather than bits and pieces all over the codebase. Because of how context and attention work. This is the one I don't see much in commercially popular languages. It's more of a declarative thing, "configuration driven development".

Features written in one file, rather than "cohesive" modules with a single "responsibility" in one file?

So, orthogonal to the accepted, common code organization idiom (no matter how infrequently adhered to)?

Fascinating! Just the other day I decomposed a massive Demeter violation into stepwise proxying "message passing." I was concerned that implementing this entire feature—well, at least a solid chunk of it— as a single, feature-scoped module would cause the next developers eyes to glaze over upon encountering such a ball-of-mud, such a dense vortex of spaghetti.

But, as I drove home that evening, I couldn't help wonder if I hadn't, instead, merely buried the gordian lede behind so many ribbons of silk.

> That is, a feature can largely be written in one file, rather than bits and pieces all over the codebase.

This seems to be at odds with the goal of token minimization. Lots of small files that are narrowly scoped means less has to be loaded into context when making a change, right?

Throwing out another idea: I wonder if we could see some kind of equivalent of c header files for more modern languages so that an llm just has to read the equivalent of a .h file to start using a library.

  • > This seems to be at odds with the goal of token minimization. Lots of small files that are narrowly scoped means less has to be loaded into context when making a change, right?

    my solution (as someone that's building something tangential) is to use granular levels of scope - there should be an implicit single file that gets generated from a package at a certain phase of the static tool processing. But the package is still split into files for flexibility and DevEx (developper experience). Files/Folder organization is super useful for humans. For tooling, the pacakge can be taken collected together, and taken as a single unit, but still decomposed based on things like namespace, and top-level definitions that define things like classes, specifications, etc. That way the tooling has control over how much context to pass in.

  • I think AST aware code reading is criminally underused by agents - you don't need a header file if you can see a listing of all the functions in a library.

    Similarly, I don't read the whole file a function is in while editing it in an IDE, why should a coding agent get the whole file polluting its context by default?

    • Check out Ataraxy-Labs/weave for AST-aware git merges.

      But, I wonder, do AST-aware tools cleave to the LLM training manifold the way coding-tutorial slop does?

Well, Rust does fulfill these to a reasonable degree. There is obvious room for improvement, but the vast majority of languages don't even bother being a Rust successor. Instead, they take a step back and decided that what Rust is doing is too much, e.g. Zig. It's kind of irritating that everyone and their dog is coming up with a new programming language that barely changes anything when there are so many low hanging fruit. The vast majority of programming languages that people are coming up with could have been language subsets, extensions or alternative runtimes for existing languages.

> all too human coding agents

There is no actual thought occurring. Arguably, we can say the same about a lot of humans at any given moment, but with machines there never is. It's all statistics.