Comment by suddenlybananas
1 hour ago
Why would they bother? Because it costs essentially nothing to add it to the training data. My point is that once a reasoning example becomes sufficiently viral, it ceases to be a good test because companies have a massive incentive to correct it. The fact some models got it right before (unreliably) doesn't mean they wouldn't want to ensure that the model gets it right.
No comments yet
Contribute on Hacker News ↗