Comment by rkachowski

3 years ago

is there some kind of convention that defines what specifically constitutes GPT-n or does this just mean "we're not working on the successor to GPT-4 yet"?

3 comments

rkachowski

ilaksh 3 years ago

There may be conventions but in no way can anyone force them to follow them. It's just a name for a release. They absolutely are working on the successor models and have stated they plan to release a model by June. Whether they are working on a new architecture or training running, they certainly have experiments, but who knows how serious they are.

Regardless they can and will call future models anything they want. They could easily just decide that the minor improvements that come out in a few months are called GPT-4.2 and the major new training run is called GPT-4.5 instead of GPT-5.

dougmwne 3 years ago

No, it is just an arbitrary version number for this series of models from OpenAI. They will flip to 5 when they make an architecture change that will force them to begin training from scratch. Until then they will continue to produce more refined versions of 4, potentially more general training or fine-tuned task-oriented training.

PeterisP 3 years ago

The way it currently works, there is a quite clear boundary, as all the smaller iterations are based on something of a fixed size that was expensively pretrained, and then have either finetuned weights or some extra layers on top, but the core model structure and size can't be changed without starting from scratch.

So if some particular GPT-4 improved successor is based on the GPT-4 core transformer size and pretrained parameters then we'd call it GPT-4.x, but if some other GPT-4 successor is a larger core model (which inevitably also means it's re-trained from scratch) then we'd call it GPT-5, no matter if its observable performance is better or worse or comparable to the tweaked GPT-4.x options.