Comment by scribu

2 years ago

I’m not sure about the Speed chart. I would expect gpt-4-turbo to be faster than plain gpt-4.

5 comments

scribu

Check out the graphs over time on the model pages - https://artificialanalysis.ai/models/gpt-4-turbo-1106-previe....

OpenAI are doing a ton of load balancing, presumably constantly tweaking batch sizes to try to optmize across all their workloads.

You can test the GPT-4 vs GPT-4 Turbo on Playground to intuitively confirm that the speeds are similar.

pseudosavant 2 years ago

I thought so too. Could it be that gpt-4 turbo is more efficient for them to run, so the price is lower, but tries to maintain the token throughput of GPT4 over their API? There are a lot of ways they could allocate and configure their GPU resources so that GPT-4 Turbo provides the same per user throughput while greatly increasing their system throughput.

bredren 2 years ago
The speed of GPT-4 via chatgpt varies greatly on when you’re using it.
Could the data have been collected when the system is under different loads?
- pseudosavant 2 years ago
  
  Unless they captured many different times and days, that is very likely a factor. GPU resources are constrained enough that during peak times (which vary across the globe) the token throughput will vary a lot.
- MacsHeadroom 2 years ago
  
  The speed data is an average over 30 days.
  Clearly OpenAI is throttling their API to save costs and get more out of fewer GPUs.