Comment by moi2388

3 months ago

Same. If i tell it to choose A or B, I want it to output either “A” or “B”.

I don’t want an essay of 10 pages about how this is exactly the right question to ask

15 comments

moi2388

10 pages about the question means that the subsequent answer is more likely to be correct. That's why they repeat themselves.

3836293648 3 months ago

But that goes in the chain of thought, not the response
binary132 3 months ago
citation needed
- porridgeraisin 3 months ago
  
  First of all, consider asking "why's that?" if you don't know what is a fairly basic fact, no need to go all reddit-pretentious "citation needed" as if we are deeply and knowledgeably discussing some niche detail and came across a sudden surprising fact.
  Anyways, a nice way to understand it is that the LLM needs to "compute" the answer to the question A or B. Some questions need more compute to answer (think complexity theory). The only way an LLM can do "more compute" is by outputting more tokens. This is because each token takes a fixed amount of compute to generate - the network is static. So, if you encourage it to output more and more tokens, you're giving it the opportunity to solve harder problems. Apart from humans encouraging this via RLHF, it was also found (in deepseekmath paper) that RL+GRPO on math problems automatically encourages this (increases sequence length).
  From a marketing perspective, this is anthropomorphized as reasoning.
  From a UX perspective, they can hide this behind thinking... ellipses. I think GPT-5 on chatgpt does this.
  
  6 replies →
- czl 3 months ago
  
  [flagged]

LLMs have essentially no capability for internal thought. They can't produce the right answer without doing that.

Of course, you can use thinking mode and then it'll just hide that part from you.

moi2388 3 months ago

No, even in thinking mode it will sycophant and write huge essays as output.
It can work without, I just have to prompt it five times increasingly aggressively and it’ll output the correct answer without the fluff just fine.
qwertytyyuu 3 months ago

They already do hide alot from you when thinking, this person wants them to hide more instead of doing their 'thinking' 'out loud' in the response.