Comment by derac
4 days ago
I think running them against each other with a rules engine would be more interesting. Count up illegal moves and wins/unfinished games. I think llm grading is too unreliable.
4 days ago
I think running them against each other with a rules engine would be more interesting. Count up illegal moves and wins/unfinished games. I think llm grading is too unreliable.
No comments yet
Contribute on Hacker News ↗