Comment by tester756

5 days ago

huh?

any details?

6 comments

tester756

ktallett 5 days ago

[flagged]

Jcampuzano2 5 days ago

Calling these high school/early bsc maths questions is an understatement lol.
littlestymaar 5 days ago
Which would be impressive if we knew those problems weren't in the training data already.
I mean it is quite impressive how language models are able to mobilize the knowledge they have been trained on, especially since they are able to retrieve information from sources that may be formatted very differently, with completely different problem statement sentences, different variable names and so on, and really operate at the conceptual level.
But we must wary of mixing up smart information retrieval with reasoning.
- colinmorelli 5 days ago
  
  Even if we accept as a premise that these models are doing "smart retrieval" and not "reasoning" (neither of which are being defined here, nor do I think we can tell from this tweet even if they were), it doesn't really change the impact.
  There are many industries for which the vast majority of work done is closer to what I think you mean by "smart retrieval" than what I think you mean by "reasoning." Adult primary care and pediatrics, finance, law, veterinary medicine, software engineering, etc. At least half, if not upwards of 80% of the work in each of these fields is effectively pattern matching to a known set of protocols. They absolutely deal in novel problems as well, but it's not the majority of their work.
  Philosophically it might be interesting to ask what "reasoning" means, and how we can assess if the LLMs are doing it. But, practically, the impacts to society will be felt even if all they are doing is retrieval.
  
  1 reply →
- ktallett 5 days ago
  
  Considering these LLM utilise the entirety of the internet, there will be no unique problems that come up in the oLympiad. Even across the course of a degree, you will have likely been exposed to 95% of the various ways to write problems. As you say, retrieval is really the only skill here. There is likely no reasoning.