Comment by skepticATX
4 days ago
OpenAI’s systems haven’t been pure language models since the o models though, right? Their RL approach may very well still generalize, but it’s not just a big pre-trained model that is one-shotting these problems.
The key difference is that they claim to have not used any verifiers.
What do you mean by “pure language model”? The reasoning step is still just the LLM spitting out tokens and this was confirmed by Deepseek replicating the o models. There’s not also a proof verifier or something similar running alongside it according to the openai researchers.
If you mean pure as in there’s not additional training beyond the pretraining, I don’t think any model has been pure since gpt-3.5.
Local models you can get just the pretrained versions of, no RLHF. IIRC both Llama and Gemma make them available.