Comment by moi2388
4 hours ago
Same. If i tell it to choose A or B, I want it to output either “A” or “B”.
I don’t want an essay of 10 pages about how this is exactly the right question to ask
4 hours ago
Same. If i tell it to choose A or B, I want it to output either “A” or “B”.
I don’t want an essay of 10 pages about how this is exactly the right question to ask
10 pages about the question means that the subsequent answer is more likely to be correct. That's why they repeat themselves.
citation needed
First of all, consider asking "why's that?" if you don't know what is a fairly basic fact, no need to go all reddit-pretentious "citation needed" as if we are deeply and knowledgeably discussing some niche detail and came across a sudden surprising fact.
Anyways, a nice way to understand it is that the LLM needs to "compute" the answer to the question A or B. Some questions need more compute to answer (think complexity theory). The only way an LLM can do "more compute" is by outputting more tokens. This is because each token takes a fixed amount of compute to generate - the network is static. So, if you encourage it to output more and more tokens, you're giving it the opportunity to solve harder problems. Apart from humans encouraging this via RLHF, it was also found (in deepseekmath paper) that RL+GRPO on math problems automatically encourages this (increases sequence length).
From a marketing perspective, this is anthropomorphized as reasoning.
From a UX perspective, they can hide this behind thinking... ellipses. I think GPT-5 on chatgpt does this.
1 reply →
LLMs have essentially no capability for internal thought. They can't produce the right answer without doing that.
Of course, you can use thinking mode and then it'll just hide that part from you.