Comment by PeterisP

3 years ago

The way it currently works, there is a quite clear boundary, as all the smaller iterations are based on something of a fixed size that was expensively pretrained, and then have either finetuned weights or some extra layers on top, but the core model structure and size can't be changed without starting from scratch.

So if some particular GPT-4 improved successor is based on the GPT-4 core transformer size and pretrained parameters then we'd call it GPT-4.x, but if some other GPT-4 successor is a larger core model (which inevitably also means it's re-trained from scratch) then we'd call it GPT-5, no matter if its observable performance is better or worse or comparable to the tweaked GPT-4.x options.