← Back to context

Comment by jychang

8 hours ago

Yeah, that was tried. It was called GPT-4.5 and it sucked, despite being 5-10T params in size. All the AI labs gave up on pretrain only after that debacle.

GPT-4.5 still is good at rote memorization stuff, but that's not surprising. The same way, GPT-3 at 175b knows way more facts than Qwen3 4b, but the latter is smarter in every other way. GPT-4.5 had a few advantages over other SOTA models at the time of release, but it quickly lost those advantages. Claude Opus 4.5 nowadays handily beats it at writing, philosophy, etc; and Claude Opus 4.5 is merely a ~160B active param model.

Maybe you are confused, but GPT4.5 had all the same "morality guards" as OAI's other models, and was clearly RL'd with the same "user first" goals.

True, it was a massive model, but my comment isn't really about scale so much as it is about bending will.

Also the model size you reference refers to the memory footprint of the parameters, not the actual number of parameters. The author postulates a lower bound of 800B parameters for Opus 4.5.