Comment by jabedude
21 hours ago
But that's removing a component that's critical for the test. We as users/benchmark consumers care that the service as provided by Anthropic/OpenAI/Google is consistent over time given the same model/prompt/context
21 hours ago
But that's removing a component that's critical for the test. We as users/benchmark consumers care that the service as provided by Anthropic/OpenAI/Google is consistent over time given the same model/prompt/context
Might as well have the free tokens, then, especially if it is an open benchmark they are already aware of. If they want to game it they cannot be stopped from doing so when it's on their infra.