Comment by XCSme
7 hours ago
Yes.
I am trying to think what's the best way to give most information about how the AI models fail, without revealing information that can help them overfit on those specific tests.
I am planning to add some extra LLM calls, to summarize the failure reason, without revealing the test.
No comments yet
Contribute on Hacker News ↗