Comment by SubiculumCode
1 year ago
In some ways, it's a good tool to teach yourself to sus out the real clues to reliability, not format and authoritative tone.
1 year ago
In some ways, it's a good tool to teach yourself to sus out the real clues to reliability, not format and authoritative tone.
But that's the thing. The only way to truly find out if it's reliable (>90%) is to check the data yourself.
This is why metrics and leaderboards like these are so important (but under reported on): https://github.com/vectara/hallucination-leaderboard https://www.kaggle.com/facts-leaderboard
Google Gemni models seem to lead...hopefully the metrics aren't being gamed.