Comment by orbital-decay
18 days ago
>It’s not just about non-determinism
I'm very specifically addressing prompt reproducibility mentioned above, because it's a notorious red herring in these discussions. What you want is correctness, not determinism/reproducibility which is relatively trivial. (although thinking of it more, maybe not that trivial... if you want usable repro in the long run, you'll have to store the model snapshot, the inference code, and make it deterministic too)
>A one word difference in a spec can and frequently does produce unrecognizably different output.
This is well out of scope for the reproducibility and doesn't affect it in the slightest. And for practical software development this is also a red herring, the real issue is correctness and spec gaming. As long as the output is correct and doesn't circumvent the intention of the spec, prompt instability is unimportant, it's just the ambiguous nature of the domain LLMs and humans operate in.
Well if you want to use it as a high level language where you check in the spec and regenerate the code then prompt instability/chaotic output makes that infeasible.
You can’t just tell users “sorry there are a million tiny differences all over the app every time we change the slightest thing, that’s just the ambiguous nature of reality”.
>where you check in the spec and regenerate the code then prompt instability/chaotic output makes that infeasible
What, why would you want to write the code anew? Identify the changes in the spec and bring the existing code in line with them.
That’s the whole thesis of the article. Using an LLM as a high level language.
>The codebase should be reconstructable from the documentation
2 replies →