Comment by rglynn

5 hours ago

IME tok/s is only useful with the additional context of ttft and total latency. At this point a given closed-model does not exist in a vaccuum but rather in a wider architecture that affects the actual performance profile for an API consumer.

This isn't usually an issue comparing models within the same provider, but it does mean cross-provider comparison using only tok/s is not apples-to-apples in terms of real-world performance.

1 comment

rglynn

Rapzid 4 hours ago

Exactly. Really frustrating they don't advertise TTFT and etc, and that it's really hard to find any info in that regard on newer models.

For voice agents gpt-4.1 and gpt-4.1-mini seem to be the best low latency models when you need to handle bigger data more complex asks.

But they are a year old and trying to figure out if these new models(instant, chat, realtime, mini, nona, wtf) are a good upgrade is very frustrating. AFAICT they aren't; the TTFT latencies are too high.