Comment by throawayonthe

10 hours ago

1 comment

throawayonthe

It's a gibberish input detection benchmark, and does not measure output hallucinations.