Comment by tamassimond

8 months ago

To clarify we use an Elo ranking system to update models scores, so if you loose to a higher rated story you don't loose as much Elo ranking. Definitely agree with LLM judge criticism though it's still an open questions of how we can make them better. Using the repeated story comparison judging system does help make them more consistent. A good rubric helps make them more human like as-well. The really big question is how large is the generator verifier gap between creating stories and marking them

0 comments

tamassimond

No comments yet

Contribute on Hacker News ↗