Comment by Davidzheng
6 days ago
We can only go off their word unfortunately and they say no formal math. so I assume it's being eval'd by a verifier model instead of a formal system. There's actually some hints of this b/c geometry in Lean is not that well developed so unless they also built their own system it's hard to do it formally (though their P2 proof is by coordinate bash (computation by algebra instead of geometric construction) so it's hard to tell.
> We can only go off their word
We’re talking about Sam Altman’s company here. The same company that started out as a non profit claiming they wanted to better the world.
Suggesting they should be given the benefit of the doubt is dishonest at this point.
“they must be lying because I personally dislike them”
This is why HN threads about AI have become exhausting to read
In general I agree with you, but I see the point of requiring proof for statements made by them, instead of accepting them at face value. In those cases, given previous experiences and considering that they benefit from making them, if they are believed, the burden of proof should be on those making these statements, not on those questioning them, no?
Those models seem to be special and not part of their normal product line, as is pointed out in the comments here. I would assume that in that case they indeed had the purpose of passing these tests in mind when creating them. Or was it created for something different, and completely by chance they discovered they could be used for the challenge, unintentionally?
Yeah, that's how the concept of "reputation" works.
No, they are likely lying, because they have huge incentives to lie