← Back to context

Comment by alasdair_

4 years ago

I set up the production servers for Pokemon GO using my personal email address as the owner. When we hit 120 million users and all the servers melted, guess which address every single one of those people was told to email about the problem?

It took a script running for five days to delete everything.

Can I just say I would love a detailed break down of the glorious mess behind the Pokemon GO launch. As a person who both wanted to play from the very start and couldn't even log in and a DevOps person who empathized heavily, I'd love to hear all about the firefighting behind the scenes during those first few hours to weeks.

  • I’d love to write one. I need to take a look at my NDA again just to make sure I don’t infringe on anything.

    I can say one thing: the problems were heavily exacerbated by a few botters trying to scrape the entire planets worth of data every minute so they could charge money for realtime maps of everything. Every time we’d shut them out, they would find another way around the limits. Not fun when we were already so overloaded with real users.

  • Seconded - is there any article anywhere about the technical side of these early days?

Oh man. I'm assuming you used your Gmail or Microsoft account where they can handle that amount of influx. Imagine if that happened on a late 90s ISP-run email account running on a single server. Ouch.

  • A while ago I worked on an SMS gateway and somebody had entered their own phone number in some sort of test. Somebody tested it, and there was a bug which triggered 100s of messages to be sent to their phone.

    This was over a decade ago, when the messages were stored on the SIM and there was a limit as to how many they could hold (something like 20). So you just fill up the limit and that's it right? Nope, the carrier helpfully buffers messages that can't be received, and they will be sent/received when there is space on the device. I can't remember how they resolved it in the end, I guess just waiting for the messages to expire (72 hours).

  • I once tried to apply a hotfix in prod by opening the PHP file through ftp and modify it inline and just save. What I was fixing was the email sending logic. Since it was a long time since I had written any PHP, I forgot to add a $ before my i-variable in the loop. It still ran, though, just getting stuck with index=0 and sending email after email. Luckily my user was the first one in the db so no one else was affected.

    But since PHP running on a managed host isn't something one can easily "shut off" it ran until it timed out, sending thousands of email to my gmail. While Google could handle it, it ended up locking my account for a few days with an error every time I tried opening the inbox.

    Luckily my domain / provider didn't get blocked or spam-listed in the future.

Just curious, how is something like this typically handled/addressed internally? Is it one of those live and learn type situations or was there any consequence other than your email getting flooded?