Comment by nwiswell

3 years ago

I think unfortunately Elo is misapplied here.

Elo is appropriate for chess, where there is no initial game-state variance, and no built-in advantage for either competitor except who goes first; that can be addressed by averaging, by using the results of tournaments where the competitors swap colors, or simply by maintaining a separate Elo as white and as black.

Similarly for Starcraft you can track Elo separately for Terran/Zerg/Protoss. (Technically you would also need to do the same by map, but anyway...)

With MTG, you have a huge effect from the quality of the deck. Unless you have each player play with each deck, there's no way to de-convolute the quality of the deck vs the quality of the player. And if you did have that data, Elo couldn't leverage it -- you'd need a more sophisticated model to account for that statistical effect.

Then there's the game-state variance you allude to... Regardless of how good you are at MTG, and even how good your deck is, you're going to lose a lot of games due to mana flood / mana screw / etc. When that happens to either player, the outcome of the game does not contain useful information about skill. Of course if you sample enough games, you can still figure out what is skill and what is chance, but using Elo with low-count datasets is bound to be misleading because it is designed for games of pure skill, where game outcomes contain information about relative skill levels 100% of the time. Maybe you could establish some rules about what games are appropriate to use as indicators of relative skill, and which ones must be discarded?

Anyway it's an interesting idea. Here's related reading for the MMR score used in Magic Arena:

https://hareeb.com/2021/05/23/inside-the-mtg-arena-rating-sy...

> With MTG, you have a huge effect from the quality of the deck. Unless you have each player play with each deck, there's no way to de-convolute the quality of the deck vs the quality of the player. And if you did have that data, Elo couldn't leverage it -- you'd need a more sophisticated model to account for that statistical effect.

How well a player chooses their deck is one of the factors that determines how good a player is. You can say the same thing about the other games: I'd probably have a better rating in chess if I didn't only play somewhat unsound gambits, and I'd definitely have a better rating in Starcraft if I didn't only do 2port wraith in TvZ.

  • im not sure if its the same in magic, but when I played yugioh how well made a deck was was also just a indicator of how much money you had

  • I'm not so sure.

    A deck is something you have. A build order, or a chess opening, is something you _know_ and therefore more or less what I'd be comfortable calling skill.

    • In my experience playing MTG (and other card games), when players discuss skill, they generally mean a (somewhat fuzzy) combination of both deckbuilding and "piloting" ability. It's understandable to want to draw a line between the two and say "I only want to evaluate in-game decision-making,", but, not only is that wildly impractical (it's going to be really hard to develop a model which fairly accounts for the fact that your buddy Jeff only likes playing decks which do nothing for 50 turns and then win the game on the spot iff no one else has a counterspell[0]), part of the way card game leagues work (again, in my experience) is that players spend a lot of time trying to figure out how to make their decks better and adapt them to what other players are doing. If you can't capture that effort, I honestly think you might be missing the point a little bit.

      [0] Let's be clear, Jeff's deck is bad and he's going to lose a lot, even if he's a time-traveling supercomputer with the diplomatic finesse of Otto von Bismarck.

      1 reply →

    • Beyond a certain level, everyone has access to all the cards they want. That might cause a poor fit on the bottom levels, but intermediate to advanced levels, its not about ownership

      4 replies →

I actually came up with a play style variation that avoids the mana flood / screw and my son and I use it when we play. Honestly, I find it a lot more fun.

You split your deck into two stacks. One with land and one with everything else. For your starting hand, you take 3 land and 4 of everything else.

Each draw phase, you pick which stack you draw from.

That’s it. Everything else stays the same but mana floods / screws completely stop.

  • It’s good for teaching younger players who still have temper problems, but there’s only so much of the game you can experience this way. And don’t expect to get to advanced or expert strategies without the game balance falling apart.

    One knock on effect I’d predict is higher mana value cards would be substantially more playable. I expect a deck of walls + counterspells + removal + big finishers like the Eldrazi or even just Baneslayer Angel to be much more effective than it is now.

    On the other end of the spectrum super low to the ground aggro strategies also get a huge bonus by simply never having to draw a land again.

    Probably Storm (play a bunch of cheap spells, typically with a discount or with effects that give you mana when you cast a spell) gets a huge boost as well as they can ensure they never fizzle out. Once the engine is going they’ll always win unless they get countered.

    What lose out here are all the decks in the middle. The midrange, “fair” decks that are just trying to curve out with the best play each turn.

    And all that’s not counting the rules headache with cards like Oracle of Mul-Daya, Fact or Fiction, Treasure Hunt, or Dark Confidant. Which pile does my Maze of Ith go in? Cultivate? Sol Ring? Faceless Haven?

    With that said it’s also my personal opinion that variance just makes the game more enjoyable and widens the group of players you can compete against, as long as you have the emotional capacity to not take losses personally.

    • Variance is a critical lifeblood for card games, as demonstrated by the commander companion mechanic debacle, as a key lesson that mtg r&d has known but occasionally forgets. But not only at casual levels but many levels, mtg can definitely have too much starting hand quality variance that starts to reduce fun. Tournament mulligan rules and mtg arena starting hand sampling algorithms point to this.

      I personally suspect that aggro would completely dominate a separate land deck meta if pushed hard enough. But I'm all on board for an alternative game design that invents new interesting questions to ponder, addresses a pain point of mtg design, and most of all makes it more fun for a kid.

      1 reply →

    • Fwiw, the amount of cards we have to play with is pretty limited to a few starter sets so very specialized decks don't really happen. I got rid of all of my cards from the 90s (still kicking myself there).

      It's more that it keeps the game fun as he's just getting into it. You're guaranteed that both players are going to have playable draws.

      For any setting with more advanced players there would definitely be side effects and a more polished set of rules in place for those special cards and circumstances.

    • It is actually the opposite. If you choose your pile then low cost decks are better because you can choose to spend fewer draws on land. Expensive cards are better if you get a draw from each pile each turn.

      1 reply →

The variance argument may be solid in general but I will say that mana flood and mana screw can be greatly alleviated through deck building and use of mulligan.

You don't often see it happen during high level play.

I used to be rather careless in how I planned the mana of my decks and rarely took a mulligan. I faced mana issues all the time. After putting more planning into my mana base and deciding on a careful strategy for when to take a mulligan I now rarely experience those issues. When I do it is mainly because I break my own rules out of greed when refusing to admit a hand with great cards is too low on mana.

I agree that Elo falls rather short for multiplayer games (the article's approach probably converges much more slowly, or fails to converge, than an approach which is built around supporting multiplayer contests, and the simplification for "board zaps" is likely just plain wrong--although that might be a limitation of how they recorded their games), but I don't think individual MTG games having a substantial amount of luck should really impact the usefulness of Elo (or similar systems such as Glicko). After all, Elo is just trying to find ratings which best predict a given game outcome, so the presence of good/bad draws should still be well-modeled by that idea, and in particular, for two given players (at a particular point in time and holding particular decks[0]), it stands to reason that you should be able to still find some pair of ratings Rx and Ry s.t. P(x beats y) = 1/(1+10^(Rx-Ry)/400).

That being said, the inherent randomness of MTG maybe means that in an ill-defined, abstract sense, it takes "more skill" to improve 100 Elo points in MTG than in Chess, because X% of your games have no meaningful decisions so you have fewer places to take advantage of your superior decision-making and, further, this probably has real implications for reasonable choices of K if you're running, say, MTG Arena, but the article is pretty clear that they're not doing anything especially rigorous when picking K in the first place, and honestly (IMO) it probably doesn't matter a whole lot if you're running a Friday night beer league with some friends or whatever.

[0] I agree with the sibbling comment that deck selection and deckbuilding is a large part of what magic players mean when they discuss skill, and it seems very reasonable to allow those things to be included in our model.

Elo isn't misapplied here. It's just that when game results have a higher luck factor, you get a narrower distribution with shorter tails. You don't get those 2800 Elo players like in chess, who have virtually a 100% chance of beating nearly everyone everytime. The best and worst players tend more to the center, but there's still meaningfulness behind the score.

> Regardless of how good you are at MTG, and even how good your deck is, you're going to lose a lot of games due to mana flood / mana screw / etc.

What makes this different from blind build order choices in Starcraft? The greed > safe > rush > greed interactions often set one player ahead pretty arbitrarily in the very early game.

  • It's more like if mineral placement was randomized at the start of a game, with players having uneven access.

    What you are describing with the build order exists also. Many Magic decks have a single game plan ("rush", etc) and can only minimally adapt between games in a match (by swapping cards with a 15-card sideboard). The degree to how uneven a matchup is can vary a lot, and some decks are hybridized so it doesn't just devolve in to rock/paper/scissors

If the purpose of Elo's system was to predict the outcome of a game between two players who have no or limited prior interaction, it can be "misapplied" to great effect. While unfairness and randomness (starting as black vs. white in Chess) can bias and increase the variance of that estimate, it is still better than tossing a coin.