← Back to context

Comment by gizmo686

2 days ago

My work has involved a project that is almost entirely generated code for over a decade. Not AI generated, the actual work of the project is in creating the code generator.

One of the things we learned very quickly was that having generated source code in the same repository as actual source code was not sustainable. The nature of reviewing changes is just too different between them.

Another thing we learned very quickly was that attempting to generate code, then modify the result is not sustainable; nor is aiming for a 100% generated code base. The end result of that was that we had to significantly rearchitect the project for us to essentially inject manually crafted code into arbitrary places in the generated code.

Another thing we learned is that any change in the code generator needs to have a feature flag, because someone was relying on the old behavior.

I think the biggest difference here is that your code generator is probably deterministic and you likely are able to debug the results it produces rather than treating it like a black box.

  • Overloading of the term "generate" is probably creating some confused ideas here. An LLM/agent is a lot more similar to a human in terms of its transformation of input into output than it is to a compiler or code generator.

    I've been working on a recent project with heavy use of AI (probably around 100 hours of long-running autonomous AI sprints over the last few weeks), and if you tried to re-run all of my prompts in order, even using the exact same models with the exact same tooling, it would almost certainly fall apart pretty quickly. After the first few, a huge portion of the remaining prompts would be referencing code that wouldn't exist and/or responding to things that wouldn't have been said in the AI's responses. Meta-prompting (prompting agents to prepare prompts for other agents) would be an interesting challenge to properly encode. And how would human code changes be represented, as patches against code that also wouldn't exist?

    The whole idea also ignores that AI being fast and cheap compared to human developers doesn't make it infinitely fast or free, or put it in the same league of quickness and cheapness as a compiler. Even if this were conceptually feasible, all it would really accomplish is making it so that any new release of a major software project takes weeks (or more) of build time and thousands of dollars (or more) burned on compute.

    It's an interesting thought experiment, but the way I would put it into practice would be to use tooling that includes all relevant prompts / chat logs in each commit message. Then maybe in the future an agent with a more advanced model could go through each commit in the history one by one, take notes on how each change could have been better implemented based on the associated commit message and any source prompts contained therein, use those notes to inform a consolidated set of recommended changes to the current code, and then actually apply the recommendations in a series of pull requests.

  • People keep saying this and it doesn't make sense. I review code. I don't construct a theory of mind of the author of the code. With AI-generated code, if it isn't eminently reviewable, I reflexively kill the PR and either try again or change the tasking.

    There's always this vibe that, like, AI code is like an IOCCC puzzle. No. It's extremely boring mid-code. Any competent developer can review it.

    • I assumed they were describing AI itself as a black box (contrasting it with deterministic code generation), not the output of AI.

      6 replies →

    • You construct a theory of mind of the author of a work whether you recognize you are doing it or not. There are certain things everyone assumes about code based on the fact that we expect someone who writes code to have simple common sense. Which, of course, LLMs do not.

      When you are talking to a person and interpreting what they mean, you have an inherent theory of mind whether you are consciously thinking "how does this person think" or not. It's how we communicate with other people efficiently and it's one of the many things missing with LLM roulette. It's not that you generate a new "theory of mind" with every interaction. It's not something you have to consciously do (although you can, like breathing).

> One of the things we learned very quickly was that having generated source code in the same repository as actual source code was not sustainable

My rule of the thumb is to have both in same repo, but treat generated code like binary data. This was informed by when I was burned by a tooling regression that broke the generated code and the investigation was complicated by having to correlate commits across different repositories

  • I love having generated code in the same repo as the generator because with every commit I can regenerate the code and compare it to make sure it stays in sync. Then it forms something similar to a golden tests where if something unexpected changes it gets noticed on review.

> One of the things we learned very quickly was that having generated source code in the same repository as actual source code was not sustainable.

Keeping a repository with the prompts, or other commands separate is fine, but not committing the generated code at all I find questionable at best.

  • If you can 100% reproduce the same generated code from the same prompts, even 5 years later, given the same versions and everything then I'd say "Sure, go ahead and don't saved the generated code, we can always regenerate it". As someone who spent some time in frontend development, we've been doing it like that for a long time with (MB+) generated code, keeping it in scm just isn't feasible long-term.

    But given this is about LLMs, which people tend to run with temperature>0, this is unlikely to be true, so then I'd really urge anyone to actually store the results (somewhere, maybe not in scm specifically) as otherwise you won't have any idea about what the code was in the future.

    • > If you can 100% reproduce the same generated code from the same prompts, even 5 years later

      Reproducible builds with deterministic stacks and local compilers are far from solved. Throwing in LLM randomness just makes for a spicier environment to not commit the generated code.

    • Temperature > 0 isn’t a problem as long as you can specify/save the random seed and everything else is deterministic. Of course, “as long as” is still a tall order here.

      3 replies →

  • I didn't read it as that - If I understood correctly, generated code must be quarantined very tightly. And inevitably you need to edit/override generated code and the manner by which you alter it must go through some kind of process so the alteration is auditable and can again be clearly distinguished from generated code.

    Tbh this all sounds very familiar and like classic data management/admin systems for regular businesses. The only difference is that the data is code and the admins are the engineers themselves so the temptation to "just" change things in place is too great. But I suspect it doesn't scale and is hard to manage etc.

  • I feel like using a compiler is in a sense a code generator where you don't commit the actual output

    • > I feel like using a compiler is in a sense a code generator where you don't commit the actual output

      Compilers are deterministic. Given the same input you always get the same output so there's no reason to store the output. If you don't get the same output we call it a compiler bug!

      LLMs do not work this way.

      (Aside: Am I the only one who feels that the entire AI industry is predicated on replacing only development positions? we're looking at, what, 100bn invested, with almost no reduce in customer's operating costs other than if the customer has developers).

      12 replies →

There’s a huge difference between deterministic generated code and LLM generated code. The latter will be different every time, sometimes significantly so. Subsequent prompts would almost immediately be useless. “You did X, but we want Y” would just blow up if the next time through the LLM (or the new model you’re trying) doesn’t produce X at all.

> The end result of that was that we had to significantly rearchitect the project for us to essentially inject manually crafted code into arbitrary places in the generated code.

This sounds like putting assembly in C code. What was the input language? These two bits ("Not AI generated", "a feature flag") suggest that the code generator didn't have a natural language frontend, but rather a real programming language frontend.

Did you or anyone else inform management that a code generator is essentially a compiler with extra characters? [0] If yes, then what was their response?

I am concerned that your current/past work might have been to build a Compiler-as-a-Service (CaaS). [1] No shade, I'm just concerned that other managers might read all this and then try to build their own CaaS.

[0] Yes, I'm implying that LLMs are compilers. Altman has played us for fools; he's taught a billion people the worst part of programming: fighting the compiler to give you the output you want.

[1] Compiler-as-a-Service is the future our forefathers couldn't imagine warning us about. LLMs are CaaS's; time is a flat circle; where's the exit?; I want off this ride.

  • The input was a highly structured pdf specification of a family of protocols and formats. Essentially, a real language with very stupid parsing requirements and the occasional typo. The PDF itself was clearly intended for human consumption, but I'm 99% sure that someone somewhere at some point had a machine readable specification that was used to generate most of the PDF. Sadly, no one seems to know where to even start looking for such a thing.

    > Did you or anyone else inform management that a code generator is essentially a compiler with extra characters?

    The output of the code generator was itself fed into a compiler that we also built; and about half of the codegen team (myself included) were themselves developers for the compiler.

    I think management is still scared from the 20 year old M4 monstrosity we are still maintaining because writing a compiler would be "too complex".

I will guess that you are generating orders of magnitude more lines of code with your software than people do when building projects with LLMs - if this is true I don't think the analogy holds.

Please tell us we company you are working for so that we don't send our resumes there.

Jokes aside, I have worked in projects where auto-generating code was the solution that was chosen and it's always been 100% auto-generated, essentially at compilation time. Any hand-coded stuff needed to handle corner cases or glue pieces together was kept outside of the code generator.