Comment by constantcrying
5 days ago
>if OpenAI ran this 10000 times in parallel and cherry-picked the best one, this is a lot less exciting.
That entirely depends on who did the cherry picking. If the LLM had 10000 attempts and each time a human had to falsify it, this story means absolutely nothing. If the LLM itself did the cherry picking, then this is just akin to a human solving a hard problem. Attempting solutions and falsifying them until the desired result is achieved. Just that the LLM scales with compute, while humans operate only sequentially.
The key bit here is whether the LLM doing the cherry picking had knowledge of the solution. If it didn't, this is a meaningful result. That's why I'd like more info, but I fear OpenAI is going to try to keep things under wraps.
Mark Chen posted that the system was locked before the contest. [1] It would obviously be crazy cheating to give verifiers a solution to the problem!
[1] https://x.com/markchen90/status/1946573740986257614?s=46&t=H...
> If it didn't
We kind of have to assume it didn't right? Otherwise bragging about the results makes zero sense and would be outright misleading.
> would be outright misleading
why would not they? what are the incentives not to?
Corporations mislead to make money all the damn time.
"You really think someone would do that, just go on the internet and tell lies?"
[https://youtube.com/watch?v=YWdD206eSv0]
openai have been caught doing exactly this before
3 replies →