Comment by thurn
3 days ago
To clarify, the more accurate description would be "Testing how well LLMs can follow the rules of Magic", right? There is no actual evaluation of how "well" they are playing?
3 days ago
To clarify, the more accurate description would be "Testing how well LLMs can follow the rules of Magic", right? There is no actual evaluation of how "well" they are playing?
No comments yet
Contribute on Hacker News ↗