Comment by scribu

2 years ago

I’m not sure about the Speed chart. I would expect gpt-4-turbo to be faster than plain gpt-4.

I thought so too. Could it be that gpt-4 turbo is more efficient for them to run, so the price is lower, but tries to maintain the token throughput of GPT4 over their API? There are a lot of ways they could allocate and configure their GPU resources so that GPT-4 Turbo provides the same per user throughput while greatly increasing their system throughput.

  • The speed of GPT-4 via chatgpt varies greatly on when you’re using it.

    Could the data have been collected when the system is under different loads?

    • Unless they captured many different times and days, that is very likely a factor. GPU resources are constrained enough that during peak times (which vary across the globe) the token throughput will vary a lot.

    • The speed data is an average over 30 days.

      Clearly OpenAI is throttling their API to save costs and get more out of fewer GPUs.