Comment by jdmoreira
3 days ago
I have a version of this where I have the llms play the duel decks "Elves vs Goblin" against each other using xMage as a rules engine.
Unfortunetly it gets really expensive to run even with some optimizations for the context.
I can only afford to play them with the deepseek models. They make serious blunts sometimes. This is not an easy "harness" to build and I dont have the time or disposal cash to work on it. I think a lot of work could be done on improving it still and testing better models.
It would make an amazing "arena" bench. There is plenty of more duel decks well balanced against each other.
No comments yet
Contribute on Hacker News ↗