Comment by bjourne

19 hours ago

LLMs work by generating the most likely continuation to a prompt. But they can also generate multiple likely continuations. This create multiple branches which in turn can generate even more branches. The LLM can then evaluate the branches, prune the unpromising ones, and merge the best ones. More branches means more tokens, means more effort.

3 comments

bjourne

simianwords 18 hours ago

this has nothing to do with the thinking effort however

bjourne 17 hours ago
Yes, it does. Breadth of search is exactly what the effort setting controls.
- pyentropy 16 hours ago
  
  LLM-judge/parallel branching ≠ multi-token prediction ≠ reasoning effort.
  See https://developers.openai.com/cookbook/articles/openai-harmo... and src/openai/types/shared/reasoning_effort.py