← Back to context Comment by goldenarm 10 hours ago It's a gibberish input detection benchmark, and does not measure output hallucinations. 0 comments goldenarm Reply No comments yet Contribute on Hacker News ↗
No comments yet
Contribute on Hacker News ↗