← Back to context

Comment by teej

10 months ago

Fair dice rolls is not an objective that cloud LLMs are optimized for. You should assume that LLMs cannot perform this task.

This is a problem when people naively use "give an answer on a scale of 1-10" in their prompts. LLMs are biased towards particular numbers (like humans!) and cannot linearly map an answer to a scale.

It's extremely concerning when teams do this in a context like medicine. Asking an LLM "how severe is this condition" on a numeric scale is fraudulent and dangerous.

This week I was on a meeting for a rather important scientific project at the university, and I asked the other participants “can we somehow reliably cluster this data to try to detect groups of similar outcomes?” to which a colleague promptly responded “oh yeah, chatGPT can do that easily”.

It'll also give you different results based on logically-irrelevant numbers that might appear elsewhere in the collaborative fiction document.