Comment by teej
3 days ago
Fair dice rolls is not an objective that cloud LLMs are optimized for. You should assume that LLMs cannot perform this task.
This is a problem when people naively use "give an answer on a scale of 1-10" in their prompts. LLMs are biased towards particular numbers (like humans!) and cannot linearly map an answer to a scale.
It's extremely concerning when teams do this in a context like medicine. Asking an LLM "how severe is this condition" on a numeric scale is fraudulent and dangerous.
This week I was on a meeting for a rather important scientific project at the university, and I asked the other participants “can we somehow reliably cluster this data to try to detect groups of similar outcomes?” to which a colleague promptly responded “oh yeah, chatGPT can do that easily”.
I guess, he's right - it will be easy and relatively accurate. Relatively/seemingly.
So that’s it then? We replace every well-understood, objective algorithm with well-hidden, fake, superficial surrogate answers from an AI?
3 replies →
It'll also give you different results based on logically-irrelevant numbers that might appear elsewhere in the collaborative fiction document.