Comment by gslepak
2 hours ago
> The non-hallucination rate in AA-omniscience is SOTA
Note that a perfect "non-hallucination rate" is rather meaningless as such tests can contain human hallucinations.
It means the model aligns with the possibly-true, possibly-false beliefs of the group that made the test.
Here are some examples of the questions in the benchmark. If these are representative, they seem pretty cut and dry. https://artificialanalysis.ai/evaluations/omniscience#exampl...
Well, yes, garbage in garbage out. That's a given and not what's meant by "hallucination" in this context.