Comment by skepticATX

4 days ago

OpenAI’s systems haven’t been pure language models since the o models though, right? Their RL approach may very well still generalize, but it’s not just a big pre-trained model that is one-shotting these problems.

The key difference is that they claim to have not used any verifiers.

2 comments

skepticATX

beering 4 days ago

What do you mean by “pure language model”? The reasoning step is still just the LLM spitting out tokens and this was confirmed by Deepseek replicating the o models. There’s not also a proof verifier or something similar running alongside it according to the openai researchers.

If you mean pure as in there’s not additional training beyond the pretraining, I don’t think any model has been pure since gpt-3.5.

gallerdude 4 days ago

Local models you can get just the pretrained versions of, no RLHF. IIRC both Llama and Gemma make them available.