← Back to context

Comment by pseudosavant

2 years ago

I thought so too. Could it be that gpt-4 turbo is more efficient for them to run, so the price is lower, but tries to maintain the token throughput of GPT4 over their API? There are a lot of ways they could allocate and configure their GPU resources so that GPT-4 Turbo provides the same per user throughput while greatly increasing their system throughput.

The speed of GPT-4 via chatgpt varies greatly on when you’re using it.

Could the data have been collected when the system is under different loads?

  • Unless they captured many different times and days, that is very likely a factor. GPU resources are constrained enough that during peak times (which vary across the globe) the token throughput will vary a lot.

  • The speed data is an average over 30 days.

    Clearly OpenAI is throttling their API to save costs and get more out of fewer GPUs.