Comment by kostaj
3 hours ago
Good idea about publishing intra-model variance data! Will include in the next version. Even if we put aside the two middle buckets (Mostly True and Misleading), that are somewhat subject to interpretation and hedging: On 21% of the claims still at least two models provide polar-opposite verdicts (one model saying True, and another saying False)
Of those 21% how many are time-dependent questions that are past the model’s training and requires research to verify? Like the “did Ukraine attack Russian in the past week” question?