Comment by D-Machine
12 hours ago
It should be a distillation of the session and/or the prompts, at bare minimum. No, it should not include e.g. research-type questions, but it should include prompts that the user wrote after reading the answers to those research-type questions, and perhaps some distillation of the links / references surfaced during the research.
Prompts probably should be distilled / summarized, especially if they are research-based prompts, but code-gen prompts should probably be saved verbatim.
Reproducibility is a thing, and though perfect reproducibility isn't desirable, something needs to make up for the fact that vibe-coding is highly inscrutable and hard to review. Making the summary of the session too vague / distilled makes it hard to iterate and improve when / if some bad prompts / assumptions are not documented in any way.
You have the source code though. That is the "reproducibility" bit you need. What extra reproducibility does having the prompts give you? Especially given that AI agents are non-deterministic in the first place. To me the idea that the prompts and sessions should be part of the commit history is akin to saying that the keystroke logs and commands issued to the IDE should be part of the commit history. Is it important to know that when the foo file was refactored the developer chose to do it by hand vs letting the IDE do it with an auto-refactor command vs just doing a simple find and replace? Maybe it is for code review purposes, but for "reproducibility" I don't think it is. You have the code that made build X and you have the code that made build X+1. As long as you can reliably recreate X and X+1 from what you have in the code, you have reproducibility.
> You have the source code though. That is the "reproducibility" bit you need.
I am talking about reproducing the (perhaps erroneous) logic or thinking or motivations in cases of bugs, not reproducing outputs perfectly. As you said, current LLM models are non-deterministic, so we can't have perfect reproducibility based on the prompts, but, when trying to fix a bug, having the basic prompts we can see if we run into similar issues given a bad prompt. This gives us information about whether the bad / bugged code was just a random spasm, or something reflecting bad / missing logic in the prompt.
> Is it important to know that when the foo file was refactored the developer chose to do it by hand vs letting the IDE do it with an auto-refactor command vs just doing a simple find and replace? Maybe it is for code review purposes, but for "reproducibility" I don't think it is.
I am really using "reproducibility" more abstractly here, and don't mean perfect reproducibility of the same code. I.e. consider this situation: "A developer said AI wrote this code according to these specs and prompt, which, according to all reviewers, shouldn't produce the errors and bad code we are seeing. Let's see if we can indeed reproduce similar code given their specs and prompt". The less evidence we have of the specifics of a session, the less reproducible their generated code is, in this sense.
It's not reproducible though.
Even with the exact same prompt and model, you can get dramatically different results especially after a few iterations of the agent loop. Generally you can't even rely on those though: most tools don't let you pick the model snapshot and don't let you change the system prompt. You would have to make sure you have the exact same user config too. Once the model runs code, you aren't going to get the same outputs in most cases (there will be date times, logging timestamps, different host names and user names etc.)
I generally avoid even reading the LLM's own text (and I wish it produced less of it really) because it will often explain away bugs convincingly and I don't want my review to be biased. (This isn't LLM specific though -- humans also do this and I try to review code without talking to the author whenever possible.)
You are talking about documenting the intent of a piece of software if I understand correctly. But isn't that what READMEs and comments are for?
The source code is whatever is easiest for a human to understand. Committing AI-generated code without the prompts is like committing compiler-generated machine code.
> It should be a distillation of the session and/or the prompts, at bare minimum.
Huh, I thought that's what commit message is for.
I mean, sure, a good, detailed commit message is perfectly fine to me in place of the prompts / a session distillation. But I am not holding my breath for vibe-coders to properly review their code and make such a commit message. But, if they, do, great! No need for prompt / session details.