Comment by diggan

2 days ago

If you can 100% reproduce the same generated code from the same prompts, even 5 years later, given the same versions and everything then I'd say "Sure, go ahead and don't saved the generated code, we can always regenerate it". As someone who spent some time in frontend development, we've been doing it like that for a long time with (MB+) generated code, keeping it in scm just isn't feasible long-term.

But given this is about LLMs, which people tend to run with temperature>0, this is unlikely to be true, so then I'd really urge anyone to actually store the results (somewhere, maybe not in scm specifically) as otherwise you won't have any idea about what the code was in the future.

5 comments

diggan

overfeed 2 days ago

> If you can 100% reproduce the same generated code from the same prompts, even 5 years later

Reproducible builds with deterministic stacks and local compilers are far from solved. Throwing in LLM randomness just makes for a spicier environment to not commit the generated code.

layer8 2 days ago

Temperature > 0 isn’t a problem as long as you can specify/save the random seed and everything else is deterministic. Of course, “as long as” is still a tall order here.

dragonwriter 1 day ago
My understanding is that the implementation of modern hosted LLMs is nondeterministic even with known seed because the generated results are sensitive to a number of other factors including, but not limited to, other prompts running in the same batch.
- westurner 1 day ago
  
  Gemini, for example, launched implicit caching on or about 2025-05-08: https://news.ycombinator.com/item?id=43939774 re: same:
  > Does this make it appear that the LLM's responses converge on one answer when actually it's just caching?
westurner 1 day ago

Have any of the major hosted LLMs ever shared the temperature parameters that prompts were generated with?