← Back to context

Comment by JanSt

13 hours ago

Those are (mostly) new, faster TPU

3 comments

JanSt

Reply

himata4113 13 hours ago

latest TPU's appear to reach 800tok/s rather than the advertised 300tok/s.

mgambati 9 hours ago
They demoed today 8i running ate 1300 to 1600ish tokens per second. I imagine that is caused by having a single rack serving the model just for the demo.
- himata4113 8 hours ago
  
  There's a limit to how much you can "scale" this process, it's linear, but if we did napkin math based on vllm parallel batched streams only lose around ~50% performance compared to single-stream output so doesn't explain the ridicioulusly fast numbers here.
  I wish google just came out and told us how large their flash model is, because if it's as big or smaller than gpt-5.4-nano that's the real headline here.