Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by Davidzheng

5 days ago

I don't think it's much less exciting if they ran it 10000 parallel? It implies an ability to discern when the proof is correct and rigorous (which o3 can't do consistently) and also means that outputting the full proof is within capabilities even if rare.

1 comment

Davidzheng

Reply

FeepingCreature  5 days ago

The whole point of RL is if you can get it to work 0.01% of the time you can get it to work 100% of the time.

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities