← Back to context

Comment by bwfan123

4 days ago

To me, this is a tell of human-involvement in the model solution.

There is no reason why machines would do badly on exactly the problem which humans do badly as well - without humans prodding the machine towards a solution.

Also, there is no reason why machines could not produce a partial or wrong answer to problem 6 which seems like survivor bias to me. ie, that only correct solutions were cherrypicked.

There is at least one reason - it was a harder problem. Agreed that which IMO problems are hard for a human IMO participant and which are hard for an LLM are different things, but seems like they should be positively correlated at least?

  • IMO problems are not hard. They are merely tricky. They test primarily pattern recognition capabilities, requiring that flash of insight to find the hidden clue.

    So it's no wonder that AI can solve them so well. Neural networks are great at pattern recognition.

    A better test is to ask the AI to come up with good Olympiad problems. I went ahead and tried, and the results are average.

While it's zero proof, since the data used for training is human generated, you raise an interesting point: the financial stakes are so high in LLM research that we should be skeptical of all frontier results.

An internet connected machine that reasons like humans was by default considered a fraud 5 years ago; it's not unthinkable some researchers would fake it till they made it, but of course you need proof of it before making such an accusation.

> There is no reason why machines would do badly on exactly the problem which humans do badly as well

Unless the machine is trained to mimic human thought process.

Maybe it’s a hint that our current training techniques can create models comparable to the best humans in a given subject, but that’s the limit.

  • We've hit the limit of 'our current training techniques'? This result literally used newly developed techniques that surprised researchers at OpenAI.

    Noam Brown: 'This result is brand new, using recently developed techniques. It was a surprise even to many researchers at OpenAI.'

    So your thesis is that these new techniques - which just produced unexpected breakthroughs - represent some kind of ceiling? That's an impressive level of confidence about the limits of methods we apparently just invented in a field which seems to, if anything, be accelerating.

  • > Maybe it’s a hint that our current training techniques can create models comparable to the best humans in a given subject, but that’s the limit.

    IMO is not the best humans in a given subject, college competitions are at a much higher level than high school competitions, and you have even higher level above that since college competitions are still limited to students.

Lmao.

You know IMO questions are not all equally difficult, right? They're specifically designed to vary in difficulty. The reason that problem 6 is hard for both humans and LLM is... it's hard! What a surprise.

Lol the OpenAI naysayers on this site are such conspiracy theorists.

There are many things that are hard for AI’s for the same reason they’re hard for humans. There are subtleties in complexity that make challenging things universal.

Obviously the model was trained on human data so its competencies lie in what other humans have provided input for over the years in mathematics, but that isn’t data contamination, that’s how all humans learn. This model, like the contestants, never saw the questions before.