Comment by wongarsu

12 hours ago

It's also third best overall on "AA-Omniscience Non-Hallucination Rate", far higher than DeepSeek, GPT 5.5 or Fable.

That's the one benchmark that allows LLMs to answer "I don't know" and punishes them for trying to bullshit their way through the questions