Comment by pseudosavant

2 years ago

I thought so too. Could it be that gpt-4 turbo is more efficient for them to run, so the price is lower, but tries to maintain the token throughput of GPT4 over their API? There are a lot of ways they could allocate and configure their GPU resources so that GPT-4 Turbo provides the same per user throughput while greatly increasing their system throughput.

3 comments

pseudosavant

bredren 2 years ago

The speed of GPT-4 via chatgpt varies greatly on when you’re using it.

Could the data have been collected when the system is under different loads?

pseudosavant 2 years ago

Unless they captured many different times and days, that is very likely a factor. GPU resources are constrained enough that during peak times (which vary across the globe) the token throughput will vary a lot.
MacsHeadroom 2 years ago

The speed data is an average over 30 days.
Clearly OpenAI is throttling their API to save costs and get more out of fewer GPUs.