It's unclear why the most probable next token given the context "please pick random number" won't be distributed uniformly across all the possible numbers (in the end it's totally possible for LLM return 10 logits of around same value for numbers 0..9 for example).
It's unclear why the most probable next token given the context "please pick random number" won't be distributed uniformly across all the possible numbers (in the end it's totally possible for LLM return 10 logits of around same value for numbers 0..9 for example).