← Back to context

Comment by Eisenstein

4 days ago

I'm not responding to 'it doesn't matter if they know they are being evaluated', because that isn't what you mentioned in your comment. What you said was 'they won't know they are being evaluated', which is what my reply addressed.

Oh ok well then you’re definitely right about that, they can tell and sometimes it really matters (I can’t remember if it was SWEBench or not but there was a major benchmark where the models were just inspecting git histories that were leaked into the dataset). The more insidious one is alignment but idk alignment research that well to know if this is a big deal or not.