Comment by 0x3f
3 hours ago
I think this is a good way to test a certain kind of capability, but as to whether LLMs would pass such a test, I'm guessing almost certainly not. If you've ever used one for research, it's very 'in' the current literature, whatever that may be. It's an incredible retrieval tool, and it will glibly evaluate any novel ideas that you feed in, but analyses are often incorrect when there's a paucity of directly relevant training data.
No comments yet
Contribute on Hacker News ↗