Comment by maxnevermind

4 hours ago

That is informative, I was suspecting that is how models improve their performance on some convoluted "non-googlabe" benchmarks like SimpleBench, that is how, they just got the taste of those those questions from publicly available samples and then hired people to generate similar questions and provide answers for them.

I wonder if extracting those static reasoning chains make sense given a Rich Sutton's "The Bitter Lesson" and Geoffrey Hinton's "People should stop training radiologists now.". I guess until participants make money they won't stop, not sure if they do, so far it is more about expectation of profitability as I understand.

4 comments

maxnevermind

jmalicki 3 hours ago

There is one level that these training data give examples of specific static reasoning chains.

Given exposure to enough reasoning chains, with training data that is designed around adversarial reasoning and teaching models to reason, these types of training data might be key to teaching models to reason beyond what they could gather from static data.

maxnevermind 3 hours ago
> these types of training data might be key to teaching models to reason beyond what they could gather from static data.
I was under impression that every time LLMs try to be truly novel and they need to assume things in the area where they didn't have enough data points that there were trained on, results are not good, has that changed?
- jmalicki 2 hours ago
  
  If LLMs were already good at it, the AI labs wouldn't be paying this insane amount of money for people to generate training data to teach them.