← Back to context

Comment by jagged-chisel

3 days ago

> Reproducible would be great

Wouldn’t it be great? I’m still waiting for reproducibility from LLMs.

Can you reproduce irreproducibility?

Give me a question which the LLM answers vastly differently on runs.

I keep hearing how it's dumb and wrong but no one ever shares the chat or prompt

  • Try this with ChatGPT or GROK or Claude

    How many days of the week contain the letter d?

    The answer I get with ChatGPT, and Grok is 3 and 6 with Claude.

    • I just used ChatGPT only, twice. Web interface in a Firefox private window, and in a Chrome incognito window. I asked them both the identical question "How many names of the days of the week contain the letter D?"

      In Firefox I got 6. In Chrome I got 7. LLMs are not even self-consistent.

      I have the screenshots if anyone cares.