I thought so too. Could it be that gpt-4 turbo is more efficient for them to run, so the price is lower, but tries to maintain the token throughput of GPT4 over their API? There are a lot of ways they could allocate and configure their GPU resources so that GPT-4 Turbo provides the same per user throughput while greatly increasing their system throughput.
Unless they captured many different times and days, that is very likely a factor. GPU resources are constrained enough that during peak times (which vary across the globe) the token throughput will vary a lot.
Check out the graphs over time on the model pages - https://artificialanalysis.ai/models/gpt-4-turbo-1106-previe....
OpenAI are doing a ton of load balancing, presumably constantly tweaking batch sizes to try to optmize across all their workloads.
You can test the GPT-4 vs GPT-4 Turbo on Playground to intuitively confirm that the speeds are similar.
I thought so too. Could it be that gpt-4 turbo is more efficient for them to run, so the price is lower, but tries to maintain the token throughput of GPT4 over their API? There are a lot of ways they could allocate and configure their GPU resources so that GPT-4 Turbo provides the same per user throughput while greatly increasing their system throughput.
The speed of GPT-4 via chatgpt varies greatly on when you’re using it.
Could the data have been collected when the system is under different loads?
Unless they captured many different times and days, that is very likely a factor. GPU resources are constrained enough that during peak times (which vary across the globe) the token throughput will vary a lot.
The speed data is an average over 30 days.
Clearly OpenAI is throttling their API to save costs and get more out of fewer GPUs.