Comment by louiereederson

20 hours ago

For a 56.7 score on the Artificial Intelligence Index, GPT 5.5 used 22m output tokens. For a score of 57, Opus 4.7 used 111m output tokens.

The efficiency gap is enormous. Maybe it's the difference between GB200 NVL72 and an Amazon Tranium chip?

8 comments

louiereederson

why would chip affect token quantity. this is all models.

Chips doesn’t impact output quality in this magnitude

ChrisGreenHeur 20 hours ago

True, but the qualifying the power played a large part. Most likely nuclear power for this high quality token efficiency.

You need to compare total cost. Token count is irrelevant.

If it's a new pretrain, the token embeddings could be wider - you can pack more info into a token making it's way through the system.

Like Chinese versus English - you need fewer Chinese characters to say something than if you write that in English.

So this model internally could be thinking in much more expressive embeddings.