Comment by modeless

5 days ago

Noam Brown:

> this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques.

> it’s also more efficient [than o1 or o3] with its thinking. And there’s a lot of room to push the test-time compute and efficiency further.

> As fast as recent AI progress has been, I fully expect the trend to continue. Importantly, I think we’re close to AI substantially contributing to scientific discovery.

I thought progress might be slowing down, but this is clear evidence to the contrary. Not the result itself, but the claims that it is a fully general model and has a clear path to improved efficiency.

https://x.com/polynoamial/status/1946478249187377206

62 comments

modeless

lossolo 5 days ago

> it’s also more efficient [than o1 or o3] with its thinking.

"So under his saturate response, he never loses. For her to win, must make him unable at some even -> would need Q_{even-1}>even, i.e. some a_j> sqrt2. but we just showed always a_j<=c< sqrt2. So she can never cause his loss. So against this fixed response of his, she never wins (outcomes: may be infinite or she may lose by sum if she picks badly; but no win). So she does NOT have winning strategy at λ=c. So at equality, neither player has winning strategy."[1]

Why use lot word when few word do trick?

1. https://github.com/aw31/openai-imo-2025-proofs/blob/main/pro...

nightski 4 days ago

That's a big leap from "answering test questions" to "contributing to scientific discovery".

transcriptase 4 days ago
Having spent tens of thousands of hours contributing to scientific discovery by reading dense papers for a single piece of information, reverse engineering code written by biologists, and tweaking graphics to meet journal requirements… I can say with certainty it’s already contributing by allowing scientists to spend time on science versus spending an afternoon figuring out which undocumented argument in a R package from 2008 changes chart labels.
- highwaylights 4 days ago
  
  This. Even if LLM’s ultimately hit some hard ceiling as substantially-better-Googling-automatons they would already accelerate all thought-based work across the board, and that’s the level they’re already at now (arguably they’re beyond that).
  We’re already at the point where these tools are removing repetitive/predictable tasks from researchers (and everyone else), so clearly they’re already accelerating research.
  
  1 reply →
- j2kun 4 days ago
  
  That is not what they mean by contributing to scientific discovery.
  
  1 reply →

wouldbecouldbe 4 days ago

Yeah that’s the dream, but same as with the bar exams, they are fine tuning the models for specific tests. Which probably the model even has been trained on previous version of those tests

aprilthird2021 4 days ago

What's the clear path to improved efficiency now that we've reached peak data?

NitpickLawyer 4 days ago
> now that we've reached peak data?
A) that's not clear
B) now we have "reasoning" models that can be used to analyse the data, create n rollouts for each data piece, and "argue" for / against / neutral on every piece of data going into the model. Imagine having every page of a "short story book" + 10 best "how to write" books, and do n x n on them. Huge compute, but basically infinite data as well.
We went from "a bunch of data" to "even more data" to "basically everything we got" to "ok, maybe use a previous model to sort through everything we got and only keep quality data" to "ok, maybe we can augment some data with synthetic datasets from tools etc" to "RL goes brrr" to (point B from above) "let's mix the data with quality sources on best practices".
- aprilthird2021 3 days ago
  
  Are you basically saying synthetic data and having a bunch of models argue with each other to distill the most agreeable of their various outputs solves the issue of peak data?
  Because from my vantage point, those have not given step changes in AI utility the way crunching tons of data did. They have only incrementally improved things
- jononor 4 days ago
  
  A) We are out of the Internet-scale-for-free data. Of course the companies deploying LLM based systems at massive scale are of course ingesting a lot of human data from their users, that they are seeking to use to further improve their models.
  B) Has learning though "self-play" (like with AlphaZero etc) been demonstrated working for improving LLMs? What is the latest key research on this?
  
  2 replies →
jsnell 4 days ago
The thing is, people claimed already a year or two ago that we'd reached peak data and progress would stall since there was no more high-quality human-written text available. Turns out they were wrong, and if anything progress accelerated.
The progress has come from all kinds of things. Better distillation of huge models to small ones. Tool use. Synthetic data (which is not leading to model collapse like theorized). Reinforcement learning.
I don't know exactly where the progress over the next year will be coming from, but it seems hard to believe that we'll just suddenly hit a wall on all of these methods at the same time and discover no new techniques. If progress had slowed down over the last year the wall being near would be a reasonable hypothesis, but it hasn't.
- Aperocky 4 days ago
  
  I'm loving it, can't wait to deploy this stuff locally. The mainframe will be replaced by commodity hardware, OpenAI will stare down the path of IBM unless they reinvent themselves.
- aprilthird2021 3 days ago
  
  > people claimed already a year or two ago that we'd reached peak data and progress would stall
  The claim was we've reached peak data (which, yes we did) and that progress would have to come from some new models or changes. Everything you described has made incremental changes, not step changes. Incremental changes are effectively stalled progress. Even this model has no proof and no release behind it
riku_iki 4 days ago

there is also huge realm of private/commercial data which is not absorbed by LLMs yet. I think there are way more private/commercial data than public data.
qingcharles 4 days ago
We're so far from peak data that we've barely even scratched the surface, IMO.
- aprilthird2021 3 days ago
  
  What changed from this announcement?
  > “We’ve achieved peak data and there’ll be no more,” OpenAI’s former chief scientist told a crowd of AI researchers.

strangeloops85 5 days ago

I assume there was tool use in the fine tuning?

nmca 4 days ago

There wasn’t in the CoT for these problems.

wds 4 days ago

> I think we’re close to AI substantially contributing to scientific discovery.

The new "Full Self-Driving next year"?

Velorivox 4 days ago
"AI" already contributes "substantially" to "scientific discovery". It's a very safe statement to make, whereas "full self-driving" has some concrete implications.
- mort96 4 days ago
  
  "AI" here means language models. Machine learning has been contributing to scientific discovery for ages, but this new wave of hype that marketing departments are calling "AI" are language models.
- Aperocky 4 days ago
  
  Well I also think full self-driving contribute substantially to navigating the car on the street..
oceanplexian 4 days ago
I know it’s a meme but there actually are fully self driving cars, they make thousands of trips every day in a couple US cities.
- elefanten 4 days ago
  
  The capitalization makes it a Tesla reference, which has notoriously been promising that as an un-managed consumer capability for years, while it is not yet launched even now.
- bigyabai 4 days ago
  
  > in a couple US cities
  FWIW, when you get this reductive with your criterion there were technically self-driving cars in 2008 too.
  
  2 replies →
- manmal 4 days ago
  
  I thought FSD has to be at least level 4 to be called that.
kakapo5672 4 days ago

As an aside, that is happening in China right now in commercial vehicles. I rode a robotaxi last month in Beijing, and those services are expanding throughout China. Really impressive.
tim333 4 days ago

We have Waymo and AlphaFold.

YeGoblynQueenne 5 days ago

How is a claim, "clear evidence" to anything?

mitthrowaway2 5 days ago

I read the GP's comment as "but [assuming this claim is correct], this is clear evidence to the contrary."
modeless 5 days ago
Most evidence you have about the world is claims from other people, not direct experiment. There seems to be a thought-terminating cliche here on HN, dismissing any claim from employees of large tech companies.
Unlike seemingly most here on HN, I judge people's trustworthiness individually and not solely by the organization they belong to. Noam Brown is a well known researcher in the field and I see no reason to doubt these claims other than a vague distrust of OpenAI or big tech employees generally which I reject.
- sealeck 5 days ago
  
  > I judge people's trustworthiness individually and not solely by the organization they belong to
  This is certainly a courageous viewpoint – I imagine this makes it very hard for you to engage in the modern world? Most of us are very bound by institutions we operate in!
  
  1 reply →
- zer00eyz 4 days ago
  
  > dismissing any claim from employees of large tech companies
  Me: I have a way to turn lead into gold.
  You: Show me!!!
  Me: NO (and then spends the rest of my life in poverty).
  Cold Fusion (physics not the programing language) is the best example of why you "Show your work". This is the Valley we're talking about. It's the thudnderdome of technology and companies. If you have a meaningful breakthrough you don't talk about it you drop it on the public and flex.
  
  3 replies →
- YeGoblynQueenne 4 days ago
  
  A thought-terminating cliché? Not at all, certainly not when it comes to claims of technological or scientific breakthroughs. After all, that's partly why we have peer review and an emphasis on reproducibility. Until such a claim has been scrutinised by experts or reproduced by the community at large, it remains an unverified claim.
  >> Unlike seemingly most here on HN, I judge people's trustworthiness individually and not solely by the organization they belong to.
  That has nothing to do with anything I said. A claim can be false without it being fraudulent, in fact most false claims are probably not fraudulent; though, still, false.
  Claims are also very often contested. See e.g. the various claims of Quantum Superiority and the debate they have generated.
  Science is a debate. If we believe everything anyone says automatically, then there is no debate.
  
  2 replies →
- emp17344 5 days ago
  
  OpenAI have already shown us they aren’t trustworthy. Remember the FrontierMath debacle?
  
  7 replies →
kelipso 5 days ago

[flagged]

torginus 4 days ago

Thing is, for example, all of classical physics can be derived from Newton's laws, Maxwell's equations and the laws of Thermodynamics, all of which can be written on a slip of paper.

A sufficiently brilliant and determined human can invent or explain everything armed only with this knowledge.

There's no need to train him on a huge corpus of text, like they do with ChatGPT.

Not sure what this model's like, but I'm quite certain it's not trained on terabytes of Internet and book dumps, but rather is trained for abstract problem solving in some way, and is likely much smaller than these trillion parameter SOTA transformers, hence is much faster.

tim333 4 days ago
If you look at the history of physics I don't think it really worked like that. It took about three centuries from Newton to Maxwell because it's hard to just deduce everything from basic principles.
- torginus 4 days ago
  
  I think you misundertand me, I'm making some pie in the sky statement about AI being able to discover the laws of nature in an afternoon. I'm just making the observation that if you know the basic equiations, and enough math (which is about multivariate calc), you can derive every single formula in your Physics textbook (and most undergrads do as part of their education).
  Since smart people can derive a lot of knowledge from a tiny set of axioms, smart AIs should be able to as well, which means you don't need to rely on a huge volume of curated information. Which means that needing to invest the internet and training on a terabyte of text might not be how these newer models are trained, and since they don't need to learn that much raw information, they might be smaller and faster.
  
  2 replies →
lostmsu 4 days ago

Right, humans are pretrained on terabytes of sensory data instead.
ctoth 4 days ago

And the billions of years of evolution and the language that you use to explain the task to him and and the schooling he needs to understand what you're saying it and... and and and?