Comment by wongarsu
12 hours ago
It's also third best overall on "AA-Omniscience Non-Hallucination Rate", far higher than DeepSeek, GPT 5.5 or Fable.
That's the one benchmark that allows LLMs to answer "I don't know" and punishes them for trying to bullshit their way through the questions
No comments yet
Contribute on Hacker News ↗