Comment by constantcrying

5 days ago

>if OpenAI ran this 10000 times in parallel and cherry-picked the best one, this is a lot less exciting.

That entirely depends on who did the cherry picking. If the LLM had 10000 attempts and each time a human had to falsify it, this story means absolutely nothing. If the LLM itself did the cherry picking, then this is just akin to a human solving a hard problem. Attempting solutions and falsifying them until the desired result is achieved. Just that the LLM scales with compute, while humans operate only sequentially.

10 comments

constantcrying

johnecheck 5 days ago

The key bit here is whether the LLM doing the cherry picking had knowledge of the solution. If it didn't, this is a meaningful result. That's why I'd like more info, but I fear OpenAI is going to try to keep things under wraps.

aluminum96 5 days ago

Mark Chen posted that the system was locked before the contest. [1] It would obviously be crazy cheating to give verifiers a solution to the problem!
[1] https://x.com/markchen90/status/1946573740986257614?s=46&t=H...
diggan 5 days ago
> If it didn't
We kind of have to assume it didn't right? Otherwise bragging about the results makes zero sense and would be outright misleading.
- samat 5 days ago
  
  > would be outright misleading
  why would not they? what are the incentives not to?
- lucianbr 4 days ago
  
  Corporations mislead to make money all the damn time.
- Dilettante_ 4 days ago
  
  "You really think someone would do that, just go on the internet and tell lies?"
  [https://youtube.com/watch?v=YWdD206eSv0]
- blibble 5 days ago
  
  openai have been caught doing exactly this before
  
  3 replies →