Comment by iliane5

3 years ago

I bet they’re not saying how big of a model GPT-4 is because it’s actually much smaller we would expect.

ChatGPT is IMO a heavily fine-tuned Curie sized model (same price via API + less cognitive capacity than even text davinci-003) so it would make sense that a heavily fine-tuned Davinci sized model would yield similar results to GPT-4.

8 comments

iliane5

kyle_grove 3 years ago

Yannic Kilcher makes a similar supposition based on results from the tech report https://www.youtube.com/watch?v=2zW33LfffPc&pp=ygUOeWFubmljI... . It’s about 3/4 of the way through the video if memory serves.

KeplerBoy 3 years ago

I wouldn't bet on their pricing being indicative of their costs. If MSFT wants the ChatGPT-API to be a success and is willing to subsidize it, that's just how it is.

iliane5 3 years ago

It’s not only 10x cheaper, it’s also way faster at inference and not as smart as Davinci. IMO the only logical answer is that the model is just smaller.

qumpis 3 years ago

I wonder why it's slower at inference time then (for members using their web UI), or rather, if it's similar in size to gpt3, how gpt3 is optimized in a way that gpt4 isn't or can't be?

I'd expect that by now we would enjoy similar speeds but this hasn't yet happened.

MacsHeadroom 3 years ago
GPT-4 is the same speed as legacy GPT-3 ChatGPT for me. It's only occasionally slower, which I expect is due to load and not it being larger.
- qumpis 3 years ago
  
  Interesting. I remember when the speedup of chat-gpt happened, the API prices dropped by around 10x, so I'd imagine there were some tricks of making them run faster.
  If they still haven't implemented these, it would be positively surprising (to me) to see the model run at similar speeds as chatgpt now. It'd be a great achievement if they really packed such performance on similar architecture (say by just training longer)
  
  2 replies →