Comment by yorwba

1 day ago

It's 2.2k tokens per second and GPU, so you have to multiply the token output by 16 and the price per million tokens works out to 22.5 cents.

I think they're also running this at 16 bit quant. If they lower it to 8bit, they might double their output which might come out to be 11 cents per million tokens.

Now take into account that modern LLMs tend to use 4bit inference, and Blackwell is significantly more optimized for 4 bit, we can see much less than 11 cents. Maybe a speed up of 5x if using 4bit and Blackwell vs H100 and 8 bit?

So we're looking at potentially 2.2 cents per million tokens.