Comment by pjdesno
4 hours ago
They overstate their results in the headline.
In section 2, 34% of cases are found to have "substantive" disagreements differing by 2 or more buckets - True + Misleading, Mostly True + False, or True + False.
This is probably a better measure than the headline one. It's still a concerning fraction, although some fraction is no doubt due to forcing "I don't know" cases to return an answer anyway.
No comments yet
Contribute on Hacker News ↗