Comment by ccgibson
14 hours ago
To add a bit more to what @scottcha is saying: overall GPU load has a fairly significant impact on the energy per result. Energy per result is inversely related, since the idle TDP of these servers is significant the more the energy gets spread the more efficient the system becomes. I imagine Anthropic is able to harness that efficiency since I imagine their servers are far from idle :)
You can infer the discount from the pricing of the batch API, which is presumably arranged for minimum inference costs. Anthropic offers a 50% discount there, which is consistent with other model providers.