Comment by dragonwriter
1 day ago
My understanding is that the implementation of modern hosted LLMs is nondeterministic even with known seed because the generated results are sensitive to a number of other factors including, but not limited to, other prompts running in the same batch.
Gemini, for example, launched implicit caching on or about 2025-05-08: https://news.ycombinator.com/item?id=43939774 re: same:
> Does this make it appear that the LLM's responses converge on one answer when actually it's just caching?