Comment by bluecalm

5 days ago

My view is that it's less impressive than previous go and chess results. Humans are worse at competitive math than at those games, it's still very limited space and well defined problems. They may hype "general purpose" as much as they want but for now it's still the case that AI is super human at well defined limited space tasks and can't achieve performance of a mediocre below average human at simple tasks without those limitations like driving a car.

Nice result but it's just another game humans got beaten at. This time a game which isn't even taken very seriously (in comparison to ones that have professional scene).

7 comments

bluecalm

azan_ 5 days ago

The scope and creativity required for IMO is much bigger than chess/GO. Also IMO is taken VERY seriously. It's a huge deal, much bigger than any chess or go tournaments.

bluecalm 5 days ago
Imo competitive math (or programming) is about knowing some tricks and then trying to find a combination of them that works for a given task. The number of tricks and depth required is much less than in go or chess.
I don't think it's very creative endeavor in comparison to chess/go. The searching required is less as well. There is a challenge processing natural language and producing solutions in it though.
Creativity required is not even a small fraction of what is required for scientific breakthroughs. After all no task that you can solve in 30 minutes or so can possibly require that much creativity - just knowledge and a fast mind - things computers are amazing at.
I am AI enthusiast. I just think a lot of things that were done so far are more impressive than being good at competitive math. It's a nice result blown out of proportion by OpenAI employees.
- mathluke 4 days ago
  
  I'd disagree with this take. Math olympiads are some of the most intellectually creative activities I've ever done that fit within a one day time limit. Chess and go don't even come close--I am not a strong player, but I've studied both games for hundreds of hours. (My hot take is that chess is not even very creative at all, that's why classical AI techniques produced super human results many years ago.)
  There is no list of tricks that will get a silver much less a gold medal at the IMO. The problem setters try very hard to choose problems that are not just variations of other contests or solvable by routine calculation (indeed some types of problems, like polynomial inequalities, fell out of favor as near-universal techniques made them too routine to well prepared students). Of course there are common themes and patterns that recur--no way around it given the limited curriculum they draw on--but overall I think the IMO does a commendable job at encouraging out-of-the-box thinking within a limited domain. (I've heard a contestant say that IMO prep was memorizing a lot of template solutions, but he was such a genius among geniuses that I think his opinion is irrelevant to the rest of humanity!)
  Of course there is always a debate whether competition math reflects skill in research math and other research domains. There's obvious areas of overlap and obvious areas of differences, so it's hard to extrapolate from AI math benchmarks to other domains. But i think it's fair to say the skills needed for the IMO include quite general quantitative reasoning ability, which is very exciting to see LLMs develop.
  
  3 replies →

bmau5 5 days ago

The significance, though, is that the "very limited space and well defined problems" continue to expand. Moving from a purpose built system for playing a single game, vs. having a system that can address a broader set of problems would still be a significant step - as more high value tasks will fall into it's competency range. It seems the next big step will be on us to improve eval/feedback systems in less defined problems.