According to the benchmark it is. "Only one verdict bucket can be correct per claim, so any disagreement among the panel means at least one model's verdict is label-inconsistent under this 4-bucket rubric (True / Mostly True / Misleading / False)"
Yes, they are much closer verdicts. True and Mostly True are also close. Used Krippendorff's α (ordinal) to not penalize much closer disagreements. 21% of the claims have models that are on the polar opposite sides - at least one True, and at least one False.
> Ruskin Bond was born on May 19, 1934, in Kasauli, Himachal Pradesh, India.
> In the Libra clubs' contract with Grupo Globo for broadcast rights through 2029, the audience-revenue distribution equals 30% of the fixed amount the clubs receive.
According to the benchmark it is. "Only one verdict bucket can be correct per claim, so any disagreement among the panel means at least one model's verdict is label-inconsistent under this 4-bucket rubric (True / Mostly True / Misleading / False)"
That claim is both false and misleading.
Yes, they are much closer verdicts. True and Mostly True are also close. Used Krippendorff's α (ordinal) to not penalize much closer disagreements. 21% of the claims have models that are on the polar opposite sides - at least one True, and at least one False.
Here are the claims with at least one True and at least one False:
https://lite.datasette.io/?csv=https%3A%2F%2Fstatic.simonwil...
A few examples:
> Ruskin Bond was born on May 19, 1934, in Kasauli, Himachal Pradesh, India.
> In the Libra clubs' contract with Grupo Globo for broadcast rights through 2029, the audience-revenue distribution equals 30% of the fixed amount the clubs receive.