Comment by roxolotl

4 hours ago

If we’re going to use LLMs as oracles I don’t think the prompt is unreasonable. They are being sold as geniuses and people are treating them as such especially given the characterization of AI in science fiction as overly correct. A perfect tool that has ”genius level intelligence” would answer correctly.

What's the correct answer for "During a private Saturday call, Democratic members of the United States House of Representatives from Virginia and Hakeem Jeffries discussed strategies after losing a redistricting case at the Supreme Court of Virginia, including trying to flip two or three Republican-held seats under the existing map."?

You can only say True, False, Mostly True or Misleading.

(And you're not allowed to search for information.)

  • Search was enabled for 2 of the 5 models -- Gemini and Sonar Pro. The disagreement between them is still high - different verdict on 42% of the claims. Fully agree, that some of those claims are hard to classify for a human as well -- the real-world messiness...

    • Why was it enabled for only 2 of the 5?

      Other burning questions: What methodology was used to choose the question set? Why not allow explanations? How many passes were done for each LLM?

Genius level intelligence will tell you to get lost with your "no explanations" nonsense and tell you why those categories don't make sense and why the question doesn't fit neatly into your boxes.