Comment by ccgibson

1 month ago

To add a bit more to what @scottcha is saying: overall GPU load has a fairly significant impact on the energy per result. Energy per result is inversely related, since the idle TDP of these servers is significant the more the energy gets spread the more efficient the system becomes. I imagine Anthropic is able to harness that efficiency since I imagine their servers are far from idle :)

1 comment

ccgibson

Majromax 1 month ago

You can infer the discount from the pricing of the batch API, which is presumably arranged for minimum inference costs. Anthropic offers a 50% discount there, which is consistent with other model providers.