Comment by kenjackson
2 days ago
But I feel like the evaluator should generally be stronger/better than what its evaluating. Otherwise you risk it evaluating at a lower level, while the better LLM is writing with more nuance that the lower LLM doesn't pick up on.
I've seen some places, e.g., NY Times, use expert panels to review the results from LLMs. For example, getting the author of a book/essay to evaluate how well the LLM summarizes and answers questions about the book/essay. While it's not scalable, it does seem like it will better evaluate cutting edge models.
No comments yet
Contribute on Hacker News ↗