Comment by moffkalast

3 days ago

It's a sort of arbitrary pattern matching thing that can't be trained on in the sense that the MMLU can be, but you can definitely generate billions of examples of this kind of task and train on it, and it will not make the model better on any other task. So in that sense, it absolutely can be.

I think it's been harder to solve because it's a visual puzzle, and we know how well today's vision encoders actually work https://arxiv.org/html/2407.06581v1

2 comments

moffkalast

km144 2 days ago

The real question is: Why are people designing benchmarks that, if a model is trained on them, it won't improve the performance of the model at any real-world tasks? Why would anyone care about such benchmarks?

moffkalast 2 days ago

People are like typewriter monkeys, if something is possible to make it'll eventually be made.