← Back to context

Comment by jabedude

21 hours ago

But that's removing a component that's critical for the test. We as users/benchmark consumers care that the service as provided by Anthropic/OpenAI/Google is consistent over time given the same model/prompt/context

Might as well have the free tokens, then, especially if it is an open benchmark they are already aware of. If they want to game it they cannot be stopped from doing so when it's on their infra.