← Back to context

Comment by siva7

8 hours ago

Could it really be that not only we vibeslop all apps nowadays but also don't care to even check how ai solved a benchmark it claimed solved?

Frontier model developers try to check for memorization. But until AI interpretability is a fully solved problem, how can you really know whether it actually didn't memorize or your memorization check wasn't right?