Comment by Escapado

1 day ago

I agree with the sentiment but I wonder if a sufficiently large amount of sufficiently sophisticated benchmarks existed then I would be surprised if a model would only memorize those benchmarks while showing terrible real world performance. We are not there yet but maybe one day we will be.