Comment by qumpis
3 years ago
I wonder why it's slower at inference time then (for members using their web UI), or rather, if it's similar in size to gpt3, how gpt3 is optimized in a way that gpt4 isn't or can't be?
I'd expect that by now we would enjoy similar speeds but this hasn't yet happened.
GPT-4 is the same speed as legacy GPT-3 ChatGPT for me. It's only occasionally slower, which I expect is due to load and not it being larger.
Interesting. I remember when the speedup of chat-gpt happened, the API prices dropped by around 10x, so I'd imagine there were some tricks of making them run faster.
If they still haven't implemented these, it would be positively surprising (to me) to see the model run at similar speeds as chatgpt now. It'd be a great achievement if they really packed such performance on similar architecture (say by just training longer)
The speed-up of the free and default "chatGPT" happened because they switched it from the full size GPT-3.5 to "GPT-3.5-Turbo", which is likely a finetune of the 10x smaller GPT-3 Curie.
If you have chatGPT Plus you can choose "Legacy" from the drop-down to get the smarter (and slower) 175B Parameter version of GPT-3.5. That version is the same speed as GPT-4 when load is low (early morning EST), which lends credence to the theory that GPT-4 is the same size as overparametrized GPT-3.
1 reply →