← Back to context

Comment by iliane5

3 years ago

I bet they’re not saying how big of a model GPT-4 is because it’s actually much smaller we would expect.

ChatGPT is IMO a heavily fine-tuned Curie sized model (same price via API + less cognitive capacity than even text davinci-003) so it would make sense that a heavily fine-tuned Davinci sized model would yield similar results to GPT-4.

I wouldn't bet on their pricing being indicative of their costs. If MSFT wants the ChatGPT-API to be a success and is willing to subsidize it, that's just how it is.

  • It’s not only 10x cheaper, it’s also way faster at inference and not as smart as Davinci. IMO the only logical answer is that the model is just smaller.

I wonder why it's slower at inference time then (for members using their web UI), or rather, if it's similar in size to gpt3, how gpt3 is optimized in a way that gpt4 isn't or can't be?

I'd expect that by now we would enjoy similar speeds but this hasn't yet happened.

  • GPT-4 is the same speed as legacy GPT-3 ChatGPT for me. It's only occasionally slower, which I expect is due to load and not it being larger.

    • Interesting. I remember when the speedup of chat-gpt happened, the API prices dropped by around 10x, so I'd imagine there were some tricks of making them run faster.

      If they still haven't implemented these, it would be positively surprising (to me) to see the model run at similar speeds as chatgpt now. It'd be a great achievement if they really packed such performance on similar architecture (say by just training longer)

      2 replies →