However, like with many of these obscure features, I am not so sure it works well in practice. I have the Windows 11 laptop I'm viewing that SVG from set with support enabled for english, french and russian, and I'm getting, among most of the English tags, a few stray "Psychique" and "Привидение" types in the svg. I have no idea how it chooses which one to show, there.
Just a wild guess, but perhaps the order of the translations vary across cells. Perhaps the browser just picks the first one that matches your supported locales.
My general problem with Pokémon (at least the older versions, haven't played the latest) is that when playing against others it frequently just boils down to the same set of legendary and overpowered mons.
You sort of addressed this running the milp without certain mons as options, which makes sense.
But you already have the machinery for a better constraint: max total base stat. You could think of it as "weight classes" in box.
So, for a given weight class, your team can only add up to Y in total base stat. You can squeeze one of the OP mons, but then the rest are slackers. Or you could balance them.
It makes it a lot more interesting and invites for diversity. And you could run it for many different values of Y.
That's why the competitive scene maintains a listing of tiers across generations derived from analyzing the actual usages across thoughful battles. https://www.smogon.com/sm/articles/sm_tiers
The median legendary pokemon in any given generation is typically quite mediocre in terms of viability/power, so I'm not sure it's quite correct to say that most tiers are free of legendaries. Even in ZU, the lowest possible tier, you have Pokemon like Mesprit, Regirock and Articuno kicking around in spite of their relatively high stats.
It is completely true to say that the so-called 'box legendaries' specifically - with base stat totals in the 670-700 range - tend to be excellent Pokemon and with rare exceptions are banned from 'standard' formats for being overcentralizing.
If anyone wants to see an example of this, JRose11 on YouTube is doing exactly this for Pokemon Red/Blue and putting a list together of completion times. It’s been a multi year project but I’d totally recommend starting at the beginning and losing yourself in the rabbit hole.
He also does some bad Pokemon challenges to see if it’s possible to finish with, say, a Weedle (mostly no).
You'd need something which better encapsulates the power level of a pokemon; move pools, abilities, typing and synergies all play a role. There are bad pokemon with a huge base stat total, and a hypothetical pokemon with a tiny base stat total of 50 but with ground/poison typing and the wonder guard ability would be an absolute nightmare.
The Smogon tiers (NU, RU, UU, UUBL, OU, Uber) is a thorough attempt at placing all pokemon in tiers based on overall power level and to make interesting balanced matches at different tiers.
Follow up thought: it would be cool if you imagine a match to consist of three sets each for a different class.
So each player comes with three teams of small, mid and large base stat classes. You can't repeat monster across teams. Whoever wins 2/3 wins the match.
...
And if this was my college house, we would have a price system for the mons so you wouldn't be able to repeat mons even between players. But that's a different thing altogether.
That property is true in most games though, unless it is heavily optimized for 'balance'. Consider Chess, or Poker, or Soccer, if both players play correctly a huge range of strategies are just easily exploitable and thus unplayed.
That said, complexity emerges and explodes in the tiny differences, even if there's like 8 pokemon that are in 95% of teams, and 10 situational pokemon that appear 5% of the times, that's like C(6,8) teams which is like 56 possible teams of Uber pokemon, and a buttload of possible teams with situational pokemon like choice scarf ditto, eviolite Chansey, nuzzle u-turn super fang pachirisu, etc...
Even if the teams were the same, just the possible differences in movesets create a lot of different sets, suppose 6 possible moves for each mon, you have C(4,6) sets for each mon, each with their own probability weight as well.
The 30th anniversary of Pokémon begins! It's been 30 years since the release of "Pokémon Red and Green." Pokémon will celebrate its 30th anniversary on Friday, February 27, 2026. This year is going to be the best year yet! Stay tuned!
#Pokémon30thAnniversary
damn, everyone is having such well thought out conversations, and I just scrolled to see the Pokémon that were selected.
I think I'm one of those people who worries that if i understand the math of it all, it'll lose some of the magic that made me keep my raticate for no other reason than because it was the first pokemon that accepted me without me dealing any damage to it.
Base stat total alone is a bad metric, because stat distribution is equally as important.
If the stats are distributed heavily both on attack and special attack, it's usually bad because you generally want specialist attackers and these stats could be better somewhere else like speed.
Absolutely! In general I would expect a better model to incorporate a lot of weighed terms in the objective to choose less "extreme" solutions, but here I was mostly interested in illustrating the method.
If you find the base game too easy, I can recommend the IronMON challenge: You can only use one mon, permadeath, stats are randomized, all trainer levels are buffed by 1.5x and you can't level up on wilds. Along with numerous other rules to make it harder. There are variants that are borderline impossible to beat, like Super Kaizo IronMON. Out of hundreds of thousands of attempts, it has only been beaten once. Would make for an interesting optimization problem.
Great article but I’d say you’re optimizing for the wrong metric here. For in game playthroughs, offense > defense and especially speedy offense beats anything else.
I’d state it as, Given any type, we should be able to hit it for super-effective damage with at least 1 move. And instead of taking raw BST, I’d take Max(SPD+ATK, SPD+SPA) to favour speedy offense.
Of course this does not take into question the thorny question of availability. Metagross is a top tier but only available post game in its debut. On the other hand Crobat and Gyarados are readily available in many of the games early on and evolve fairly quickly.
Please look into the competitive Nuzlocke community, there are a lot of damage calculations and viability spreadsheets all around, you’ll find it interesting.
If we're really trying to optimize for everything, I'd argue two of the biggest factors are move sets and (in generations after 2) abilities. There are Pokemon with great stats but abilities that quite literally are intended to be drawbacks (e.g. Slaking and Regigigas).
Thank you for your suggestion, I agree with you (and another commenter) that base stat is not that useful, and availability is actually what I would prioritise on in a next iteration. I tried to keep it simple here, mostly because it was interesting enough as an analysis. But if I were to redo this to get _the best_ team in a generation, I'd definitely go with what you suggested!
Haha, I started reading this, got interrupted, came back and got confused by the graph. Then came to the comments, saw your comment, reloaded the post and voila!
> I was always hunting for Pokémon with better abilities, better type coverage, analyzing synergy between moves… If you’ve ever played a mainline Pokémon game before, you must know how utterly unnecessary this is. Twenty years ago, I would have just powered through on Blastoise or Typhlosion alone.
I definitely beat the first Pokemon games with a level 100 Charizard. I even defeated gyms that were strong against fire types, often KO'ing Pokemon in one hit. The text would say "It's not very effective..." and then the opponent's health bar would drop to zero. So yeah, these games are easy enough that a 10yo can get by with twinking out a single pokemon. Makes the blog post even funnier
I do comment on that in the article, I think it's a nice example of how your model can only know what you tell it (the one I used in the article doesn't know about abilities).
I'm curious how it would rank existing teams–for example, are there trainers who pick better teams (of course, I am sure the bug catchers get soundly trounced). Surely Cynthia or Red have a strong team?
I'm not sure that theory bears out. At least in terms of competitive meta, my understanding is that gen 1 is pretty much always the same Pokemon on most teams (although that's also partially from fewer choices and the lack of newer features that provide ways to build alternative strategies like held items and abilities).
this was a great read to start the new year! having worked extensively with mixed integer programs, it is always a bit disheartening to see them not used enough for everyday decision-making. one of my goals this year is to create a layer to make it easier to formulate mips and test them, via plain text input. this would hopefully increase adoption through a lower barrier to entry.
Lots of people working in IT have tattoos, I like to see what theme/image overlap they have.
Three people in my current workplace have a balloon tattoo (interestingly all of them are red balloons). Five people in my current workplace have a Pokémon tattoo that is easily visible.
Edit: Including myself, on both counts, I should have said.
A tattoo of a balloon! Unless you meant what the meaning of the design was, and in that case different people have different associations and meanings.
One of my forearms is covered in things my son used to be obsessed by when he was young, which is why I have a lego figure, a pikachu, and a red balloon as depicted in the book "Goodnight Moon" which I read to him every night for 3+ years.
I was planning in a future sequel/update to do this but with "better" constraints like only including Pokémon available in a game, etc... Maybe even separate it into early/mid/late-game availability since most optimal Pokémon are late-game anyway.
Meanwhile I like the optimization analysis, the initial assumptions are very wrong. I know the author mentioned that they are optimizing for stats + being resistant to other Pokémon types, but that analysis will lead to very bad results.
There are Pokémon with certain abilities or tricks that makes it much better than legendary ones, with certain move sequences that could wipe the entire other team.
There are also Pokémon with certain types that are actually good against what the 'data' would say otherwise.
Maybe the analysis could be better done if you instead analyzed matches data.
BTW, the way people pay Pokémon since many years is to also divide the Pokémon into tiers and in a competitive setting, you are only allowed to pick Pokémon from the same tier or lower. This adds another level of complexity.
I generally agree with you on the point that a "good Pokémon team" can be better encapsulated by other attributes, including those you mentioned. I would disagree on assumptions being very wrong, because I am not assuming that the objective and constraints chosen are ideal or even good enough, I am choosing them simple for illustration purposes.
I actually found it interesting that in spite of what is a clearly overly simple model, the non-legendary non-multi-starter you eventually get is quite a good one, in my opinion better than what the naive constraints would lead me to think.
Also, keep in mind that I'm not talking about competitive matches here, just mainline gaming. For that end, types are usually all you need, and in that area the main thing I would do is generalize type constraints to not be just defensive but also ensure each resistant Pokémon has a good enough attack against that type.
In my opinion, abilities, nature, objects are:
1. Too complex for such models (MIPs are still exponential-time)
2. Overkill strategy when all you wanna do is beat the league
I think it's a common mistake. It would be much more intuitive for the one literally labeled "two" to come after; I imagine it doesn't work like that because they wanted the mythical one last.
For games that are as complex as pokemon, it's usually necessary to restrict analysis to some subset. In this case team typing was used.
I personally like restricting to generation 1, as it is very cannonical, very static, and one of the simplest.
Furthermore I like the 1v1 format, which instead of a team, it's just 1 pokemon vs the other. Otherwise you have to resort to heuristics.
But even with a 1v1 and generation 1 restriction it still isn't solved!
Even a single matchup it's very complex to arrive to a theoretical mathematical problem, and still quite burdensome to write a montecarlo simulation.
For example:
Tauros vs gengar (Not an uncommon matchup in competitive gen 1)
Hypnosis has a 60% accuracy, tauros can sleep for 1 to 6 turns with equal probability. Tauros can 2HKO with Earthquake, but can also crit. Gengar can 4HKO, with each crit counting as a double hit (both crits having roughly 20% chance).
The question of who has the advantage is to my knowledge unsolved (also consider that in 1v1 the answer is different, as in teams you only have 1 sleep, so Gengar wastes it). It's also different from the problem of choosing the actual correct move, not only do you need to find the best first move, but in the game decision tree, you need a decision for each node. For example, if Tauros has 60% HP and Gengar has 100%HP, is it still better to go for hypnosis, or better to go for damage and hope for 1 out of 2 crits. This is all made more complex by the fact that both mons have a speed tie, so it's yet another probability event of who will attack first.
For a simple gen 1 with hidden teams, I think there's a bigger game tree than chess, and even Poker. The fact that it's non-stochastic with hidden information makes it very similar to poker analysis wise, I bet Counter Factual Regret Minimization approaches would work as well.
An interesting thing of this article is that the SVG image of the type matchup [1] has embedded automatic translation.
The type labels will be displayed in the language your browser is set to. I didn't even know this was possible.
[1] https://upload.wikimedia.org/wikipedia/commons/9/97/Pokemon_...
Oh that's really cool, I didn't know about this! I just linked to the wikimedia-hosted illustration, but that's a good perk too.
Wow thats very cool, i was puzzled at first as to why the pokemon types were in Finnish!
It's using the <switch> tag for this
https://developer.mozilla.org/en-US/docs/Web/SVG/Reference/E...
However, like with many of these obscure features, I am not so sure it works well in practice. I have the Windows 11 laptop I'm viewing that SVG from set with support enabled for english, french and russian, and I'm getting, among most of the English tags, a few stray "Psychique" and "Привидение" types in the svg. I have no idea how it chooses which one to show, there.
Just a wild guess, but perhaps the order of the translations vary across cells. Perhaps the browser just picks the first one that matches your supported locales.
Love this!
My general problem with Pokémon (at least the older versions, haven't played the latest) is that when playing against others it frequently just boils down to the same set of legendary and overpowered mons.
You sort of addressed this running the milp without certain mons as options, which makes sense.
But you already have the machinery for a better constraint: max total base stat. You could think of it as "weight classes" in box.
So, for a given weight class, your team can only add up to Y in total base stat. You can squeeze one of the OP mons, but then the rest are slackers. Or you could balance them.
It makes it a lot more interesting and invites for diversity. And you could run it for many different values of Y.
It's a good idea but base stats aren't everything: https://youtu.be/gEkMi_y3Wzo
That's why the competitive scene maintains a listing of tiers across generations derived from analyzing the actual usages across thoughful battles. https://www.smogon.com/sm/articles/sm_tiers
In competitive Pokémon there are usually different tiers of which Pokémon are accepted. In most legendaries are limited or fully banned.
For the mainline games it usually does not matter. You can beat it with any single Pokémon pretty much.
The median legendary pokemon in any given generation is typically quite mediocre in terms of viability/power, so I'm not sure it's quite correct to say that most tiers are free of legendaries. Even in ZU, the lowest possible tier, you have Pokemon like Mesprit, Regirock and Articuno kicking around in spite of their relatively high stats.
It is completely true to say that the so-called 'box legendaries' specifically - with base stat totals in the 670-700 range - tend to be excellent Pokemon and with rare exceptions are banned from 'standard' formats for being overcentralizing.
1 reply →
If anyone wants to see an example of this, JRose11 on YouTube is doing exactly this for Pokemon Red/Blue and putting a list together of completion times. It’s been a multi year project but I’d totally recommend starting at the beginning and losing yourself in the rabbit hole.
He also does some bad Pokemon challenges to see if it’s possible to finish with, say, a Weedle (mostly no).
You'd need something which better encapsulates the power level of a pokemon; move pools, abilities, typing and synergies all play a role. There are bad pokemon with a huge base stat total, and a hypothetical pokemon with a tiny base stat total of 50 but with ground/poison typing and the wonder guard ability would be an absolute nightmare.
The Smogon tiers (NU, RU, UU, UUBL, OU, Uber) is a thorough attempt at placing all pokemon in tiers based on overall power level and to make interesting balanced matches at different tiers.
That's a great idea!
Follow up thought: it would be cool if you imagine a match to consist of three sets each for a different class.
So each player comes with three teams of small, mid and large base stat classes. You can't repeat monster across teams. Whoever wins 2/3 wins the match.
...
And if this was my college house, we would have a price system for the mons so you wouldn't be able to repeat mons even between players. But that's a different thing altogether.
That property is true in most games though, unless it is heavily optimized for 'balance'. Consider Chess, or Poker, or Soccer, if both players play correctly a huge range of strategies are just easily exploitable and thus unplayed.
That said, complexity emerges and explodes in the tiny differences, even if there's like 8 pokemon that are in 95% of teams, and 10 situational pokemon that appear 5% of the times, that's like C(6,8) teams which is like 56 possible teams of Uber pokemon, and a buttload of possible teams with situational pokemon like choice scarf ditto, eviolite Chansey, nuzzle u-turn super fang pachirisu, etc...
Even if the teams were the same, just the possible differences in movesets create a lot of different sets, suppose 6 possible moves for each mon, you have C(4,6) sets for each mon, each with their own probability weight as well.
Right on time for 30th anniversary! https://xcancel.com/Pokemon_cojp/status/2006379822012911872
Translated text:
The 30th anniversary of Pokémon begins! It's been 30 years since the release of "Pokémon Red and Green." Pokémon will celebrate its 30th anniversary on Friday, February 27, 2026. This year is going to be the best year yet! Stay tuned! #Pokémon30thAnniversary
damn, everyone is having such well thought out conversations, and I just scrolled to see the Pokémon that were selected.
I think I'm one of those people who worries that if i understand the math of it all, it'll lose some of the magic that made me keep my raticate for no other reason than because it was the first pokemon that accepted me without me dealing any damage to it.
Base stat total alone is a bad metric, because stat distribution is equally as important.
If the stats are distributed heavily both on attack and special attack, it's usually bad because you generally want specialist attackers and these stats could be better somewhere else like speed.
Absolutely! In general I would expect a better model to incorporate a lot of weighed terms in the objective to choose less "extreme" solutions, but here I was mostly interested in illustrating the method.
It was very impressive at that, congratulations.
You can go for smogon tiers as a proxy for pokemon strength.
If you find the base game too easy, I can recommend the IronMON challenge: You can only use one mon, permadeath, stats are randomized, all trainer levels are buffed by 1.5x and you can't level up on wilds. Along with numerous other rules to make it harder. There are variants that are borderline impossible to beat, like Super Kaizo IronMON. Out of hundreds of thousands of attempts, it has only been beaten once. Would make for an interesting optimization problem.
https://github.com/PyroMikeGit/SuperKaizoIronMON
This thread is a great example of how quickly “optimize X” turns into “define X” when modeling intuition meets player intuition.
Great article but I’d say you’re optimizing for the wrong metric here. For in game playthroughs, offense > defense and especially speedy offense beats anything else.
I’d state it as, Given any type, we should be able to hit it for super-effective damage with at least 1 move. And instead of taking raw BST, I’d take Max(SPD+ATK, SPD+SPA) to favour speedy offense.
Of course this does not take into question the thorny question of availability. Metagross is a top tier but only available post game in its debut. On the other hand Crobat and Gyarados are readily available in many of the games early on and evolve fairly quickly.
Please look into the competitive Nuzlocke community, there are a lot of damage calculations and viability spreadsheets all around, you’ll find it interesting.
If we're really trying to optimize for everything, I'd argue two of the biggest factors are move sets and (in generations after 2) abilities. There are Pokemon with great stats but abilities that quite literally are intended to be drawbacks (e.g. Slaking and Regigigas).
Thank you for your suggestion, I agree with you (and another commenter) that base stat is not that useful, and availability is actually what I would prioritise on in a next iteration. I tried to keep it simple here, mostly because it was interesting enough as an analysis. But if I were to redo this to get _the best_ team in a generation, I'd definitely go with what you suggested!
Why is y+2x optimal at (0,3) with a value of 3? Isnt it (3,0) with a value of 6?
Good catch! Especially since I ended up drawing y - x = C but didn't update the legend. I updated it!
Haha, I started reading this, got interrupted, came back and got confused by the graph. Then came to the comments, saw your comment, reloaded the post and voila!
Thank you for a lovely post!
you're right, it should be (3,0) with optimal obj value of 6.
My uni course on optimization was so much fun but I forgot all of it. This was a nice reminder that I should probably revisit the basics :)
> I was always hunting for Pokémon with better abilities, better type coverage, analyzing synergy between moves… If you’ve ever played a mainline Pokémon game before, you must know how utterly unnecessary this is. Twenty years ago, I would have just powered through on Blastoise or Typhlosion alone.
I definitely beat the first Pokemon games with a level 100 Charizard. I even defeated gyms that were strong against fire types, often KO'ing Pokemon in one hit. The text would say "It's not very effective..." and then the opponent's health bar would drop to zero. So yeah, these games are easy enough that a 10yo can get by with twinking out a single pokemon. Makes the blog post even funnier
Slaking can only attack every other turn making it a bad choice outside of niche teams.
I do comment on that in the article, I think it's a nice example of how your model can only know what you tell it (the one I used in the article doesn't know about abilities).
I'm curious how it would rank existing teams–for example, are there trainers who pick better teams (of course, I am sure the bug catchers get soundly trounced). Surely Cynthia or Red have a strong team?
Would have been nice to see with only first gen pokemon which were much better balanced IMHO.
I'm not sure that theory bears out. At least in terms of competitive meta, my understanding is that gen 1 is pretty much always the same Pokemon on most teams (although that's also partially from fewer choices and the lack of newer features that provide ways to build alternative strategies like held items and abilities).
this was a great read to start the new year! having worked extensively with mixed integer programs, it is always a bit disheartening to see them not used enough for everyday decision-making. one of my goals this year is to create a layer to make it easier to formulate mips and test them, via plain text input. this would hopefully increase adoption through a lower barrier to entry.
Lots of people working in IT have tattoos, I like to see what theme/image overlap they have.
Three people in my current workplace have a balloon tattoo (interestingly all of them are red balloons). Five people in my current workplace have a Pokémon tattoo that is easily visible.
Edit: Including myself, on both counts, I should have said.
>balloon tattoo
What does it mean?
A tattoo of a balloon! Unless you meant what the meaning of the design was, and in that case different people have different associations and meanings.
One of my forearms is covered in things my son used to be obsessed by when he was young, which is why I have a lego figure, a pikachu, and a red balloon as depicted in the book "Goodnight Moon" which I read to him every night for 3+ years.
which Pokémon? gotta name them all! (5)
I wish I could remember, but offhand all I can say is that we definitely have two pikachus and one snorlax.
1 reply →
The SVG chart has internationalization built-in, with multiple languages available. I thought that was cool.
I would've liked to see in conclusion a recommended starter team per generation! Very nice article!
I was planning in a future sequel/update to do this but with "better" constraints like only including Pokémon available in a game, etc... Maybe even separate it into early/mid/late-game availability since most optimal Pokémon are late-game anyway.
Meanwhile I like the optimization analysis, the initial assumptions are very wrong. I know the author mentioned that they are optimizing for stats + being resistant to other Pokémon types, but that analysis will lead to very bad results.
There are Pokémon with certain abilities or tricks that makes it much better than legendary ones, with certain move sequences that could wipe the entire other team.
There are also Pokémon with certain types that are actually good against what the 'data' would say otherwise.
Maybe the analysis could be better done if you instead analyzed matches data.
BTW, the way people pay Pokémon since many years is to also divide the Pokémon into tiers and in a competitive setting, you are only allowed to pick Pokémon from the same tier or lower. This adds another level of complexity.
I generally agree with you on the point that a "good Pokémon team" can be better encapsulated by other attributes, including those you mentioned. I would disagree on assumptions being very wrong, because I am not assuming that the objective and constraints chosen are ideal or even good enough, I am choosing them simple for illustration purposes.
I actually found it interesting that in spite of what is a clearly overly simple model, the non-legendary non-multi-starter you eventually get is quite a good one, in my opinion better than what the naive constraints would lead me to think.
Also, keep in mind that I'm not talking about competitive matches here, just mainline gaming. For that end, types are usually all you need, and in that area the main thing I would do is generalize type constraints to not be just defensive but also ensure each resistant Pokémon has a good enough attack against that type.
In my opinion, abilities, nature, objects are: 1. Too complex for such models (MIPs are still exponential-time) 2. Overkill strategy when all you wanna do is beat the league
But that last part is just my opinion.
Small typo(?):
> Mewtwo (#151)
Should be 150
Thank you, you're right! For some reason I always forget mew comes after mewtwo in the pokedex...
I think it's a common mistake. It would be much more intuitive for the one literally labeled "two" to come after; I imagine it doesn't work like that because they wanted the mythical one last.
For games that are as complex as pokemon, it's usually necessary to restrict analysis to some subset. In this case team typing was used.
I personally like restricting to generation 1, as it is very cannonical, very static, and one of the simplest.
Furthermore I like the 1v1 format, which instead of a team, it's just 1 pokemon vs the other. Otherwise you have to resort to heuristics.
But even with a 1v1 and generation 1 restriction it still isn't solved!
Even a single matchup it's very complex to arrive to a theoretical mathematical problem, and still quite burdensome to write a montecarlo simulation.
For example:
Tauros vs gengar (Not an uncommon matchup in competitive gen 1)
Hypnosis has a 60% accuracy, tauros can sleep for 1 to 6 turns with equal probability. Tauros can 2HKO with Earthquake, but can also crit. Gengar can 4HKO, with each crit counting as a double hit (both crits having roughly 20% chance).
The question of who has the advantage is to my knowledge unsolved (also consider that in 1v1 the answer is different, as in teams you only have 1 sleep, so Gengar wastes it). It's also different from the problem of choosing the actual correct move, not only do you need to find the best first move, but in the game decision tree, you need a decision for each node. For example, if Tauros has 60% HP and Gengar has 100%HP, is it still better to go for hypnosis, or better to go for damage and hope for 1 out of 2 crits. This is all made more complex by the fact that both mons have a speed tie, so it's yet another probability event of who will attack first.
https://www.smogon.com/forums/threads/gengar-vs-tauros-1v1-w...
For a simple gen 1 with hidden teams, I think there's a bigger game tree than chess, and even Poker. The fact that it's non-stochastic with hidden information makes it very similar to poker analysis wise, I bet Counter Factual Regret Minimization approaches would work as well.
Now all we need is a quick vibe coded web GUI front end
Username checks out