← Back to context

Comment by jychang

3 hours ago

Yeah, that was tried. It was called GPT-4.5 and it sucked, despite being 5-10T params in size. All the AI labs gave up on pretrain only after that debacle.

GPT-4.5 still is good at rote memorization stuff, but that's not surprising. The same way, GPT-3 at 175b knows way more facts than Qwen3 4b, but the latter is smarter in every other way. GPT-4.5 had a few advantages over other SOTA models at the time of release, but it quickly lost those advantages. Claude Opus 4.5 nowadays handily beats it at writing, philosophy, etc; and Claude Opus 4.5 is merely a ~160B active param model.