Comment by runarberg

3 days ago

The problem with benchmaxxing is that lies about the capabilities of the technology. IF all we wanted was a machine that plays chess, we would just use a chess engine, which we have known how to make for decades. If Google wanted Gemini to be able to play chess, it would be much easier (and better; and hellavulat cheaper) to stick a traditional chess engine into their product and defer all chess to that engine.

The claim here (way up thread) was: “we have the technology to train models to do anything that you can do on a computer, only thing that's missing is the data”, and the implication is that logic and reasoning is an emerging properties of these models if given enough data and enough parameters. However the evidence seems to suggest otherwise. Logic and reasoning have to be specifically programmed into these models, and even with dataset as vast as online chess games (just lichess has 7.1 billion games), if that claim above were true, chess should be easy for LLMs, but it obviously isn’t. And that tells us something about the limitations of the technology.