Comment by jmalicki
7 hours ago
I think the point of the paper is to prod benchmark authors to at least try to make them a little more secure and hard to hack... Especially as AI is getting smart enough to unintentionally hack the evaluation environments itself, when that is not the authors intent.
No comments yet
Contribute on Hacker News ↗