Comment by sigmoid10
6 days ago
I'd also be highly wary of the method they used because of statements like this:
>we note that the vast majority of its answers simply stated the final answer without additional justification
While the reasoning steps are obviously important for judging human participant answers, none of the current big-game providers disclose their actual reasoning tokens. So unless they got direct internal access to these models from the big companies (which seems highly unlikely), this might be yet another failed study designed to (of which we have seen several in recent months, even by serious parties).
No comments yet
Contribute on Hacker News ↗