← Back to context

Comment by Jensson

1 year ago

> especially when evaluated on the first snap answer

The whole point of o1 is that it wasn't "the first snap answer", it wrote half a page internally before giving the same wrong answer.

Is that really its internal 'chain of thought' or is it a post-hoc justification generated afterward? Do LLMs have a chain of thought like this at all or are they just convincing at mimicking what a human might say if asked for a justification for an opinion?

  • Its slightly more strange than this as both are true. It's already baked in the model but chain of thought does improve reasoning, you only have to look at maths problems. A short guess would be wrong but it would get it correct if asked to break it down and reason (harder to see nowadays as it has access to calculators).