Comment by BobbyJo

1 month ago

What does it mean to cross the median human paper mark? How os that measured?

It seems to me like most of the LLM benchmarks wind up being gamed. So, even if there were a good benchmark there, which I do not believe there is, the validity of the benchmark would likely diminish pretty quickly.