Comment by sdfsefsdf
16 hours ago
Perhaps I've been deep in my own issues for too long, but it seems to me that the author is trying to say "don't trust the current evaluation suites too much"; scores only reflect a small part of the problem. What's interesting is discovering a new, stable evaluation metric, doing something new based on it, and having that new thing yield some unexpected intelligent results
This is certainly part of it! My point was that focusing on problems proposed by others is one very specific and pretty short-term mode of thinking. Good researchers improve benchmark scores. Great researchers think about what problem they're solving.