Comment by throawayonthe

13 hours ago

1 comment

throawayonthe

It's a gibberish input detection benchmark, and does not measure output hallucinations.