Comment by butlike

10 days ago

I don't mean to come across as OVERLY negative (just a little negative), but what's the difference in all these toy approaches and applications of LLMs? You've seen one LLM play a game against another LLM, you've seen them all.

5 comments

butlike

orsorna 10 days ago

I was thinking you could formally benchmark decks against each other enmasse. MTG is not my wheelhouse, but with YGO at least deck power is determined by frequency of use and placement at official tournaments. Imagine taking any permutation of cards, including undiscovered/untested ones, and simulating a vast amount of games in parallel.

Of course when you quantize deck quality to such a degree I'd argue it's not fun anymore. YGO is already not fun anymore because of this rampant quantization and it didn't even take LLMs to arrive here.

deadbabe 9 days ago
Why would you use LLMs at all for that, can’t you just Monte Carlo this thing and be done with it?
- GregorStocks 9 days ago
  
  You still need an algorithm to decide, for each game that you're simulating, what actual decisions get made. If that algorithm is dumb, then you might decide Mono-Red Burn is the best deck, not because it's the best deck but because the dumb algorithm can play Burn much better than it can play Storm, inflating Burn's win rate.
  In principle, LLMs could have a much higher strategy ceiling than deterministic decision-tree-style AIs. But my experience with mage-bench is that LLMs are probably not good enough to outperform even very basic decision-tree AIs today.
  
  1 reply →

ddtaylor 10 days ago

XMage is a decent client and being able to see and watch the games is useful.