← Back to context

Comment by joelburget

1 year ago

o1 is an application of the Bitter Less. To quote Sutton: "The two methods that seem to scale arbitrarily in this way are search and learning." (emphasis mine -- in the original Sutton also emphasized learning).

OpenAI and others have previously pushed the learning side, while neglecting search. Now that gains from adding compute at training time have started to level off, they're adding compute at inference time.

I think the key part of the bitter lesson is that (scalable) ability to learn from data should be favored over built-in biases.

There are at least three major built-in biases in GPT-O1:

- specific reasoning heuristics hard coded in the RL decision making

- the architectural split between pre-trained LLM and what appears to be a symbolic agent calling it

- the reliance on one-time SGD driven learning (common to all these pre-trained transformers)

IMO search (reasoning) should be an emergent behavior of a predictive architecture capable of continual learning - chained what-if prediction.