Comment by dakolli

3 hours ago

What are you talking about, it had the option for nuanced responses, but it chose the more binary responses. It could have chosen no explanations, no qualifiers but instead it showed off LLMs incapability for nuance.

These types of experiments prove to me that there is no real "reasoning" happening and "reasoning/thinking" tokens as a concept are mostly there to convince people to use models that consume more tokens and produce more revenue. The output from reasoning models might be more accurate, but its just a consequence of a longer inference runtime, there is no "reasoning" happening, reasoning is just sales/UX bullsh*t.

> What are you talking about, it had the option for nuanced responses

The prompt allowed for exactly four valid outputs and explicitly disallowed explanations and qualifiers.

> Output exactly one label: True, > Mostly True, Misleading, or False. > No explanations, no qualifiers.

How is that a nuanced response?

> These types of experiments prove to me that there is no real "reasoning" happening and "reasoning/thinking"

My suggestion is that five presumably reasoning and thinking humans would also have variation in their responses to the exact same prompt.