Comment by macNchz

4 years ago

> I kick-started the bot on a server remotely and only then discovered it's an endless loop. It required server admin rights to stop it, which I did not have.

That sinking feeling when you realize you've started something bad and can't stop it always gives me a visceral feeling like the world is doing a dolly zoom* around me.

I have had two really fun bulk email screwups:

The first one started when we hit the the send button for a mass email campaign driving traffic to our newly launched website redesign. We immediately realized that the email marketing software had put a unique query parameter in every link that was resulting in all requests to our cache going to origin, which instantly smoked the little vm hosting the site, sending thousands of clicks to the now famous 503 Guru Meditation page. With the infrastructure folks offline in another timezone, it was the perfect environment to learn Varnish Configuration Language on the fly with the whole marketing team hanging over me with looks of horror on their faces!

The second one involved coming to work, sitting down with my coffee and noticing that our email sending process had crashed overnight. Given the rate they could be sent sequentially I realized we'd have a big backlog of tasks, so I wrote a quick shell script to split the tasks into separate lists and parallelize them across the number of cpus on our (big, colocated, dedicated, hosting many important apps and websites) server. As soon as I ran it I sent it to the background and opened up `top`, only to see thousands and thousands of forks of my process filling the list. By the time I realized that I'd flipped the numbers and split 4 email tasks into each of 50,000 processes instead of 50,000 tasks into 4 processes, the server locked up and my SSH session disconnected. Cue several panicked minutes of our apps being offline while I scrambled for the restart button in the remote management console. Somehow there were no lasting effects, and every service started up on its own when the box came back online, even though it hadn't been restarted in several years.

* https://filmschoolrejects.com/wp-content/uploads/2021/01/Jaw...

Beautiful screw-ups, thanks for sharing.

On a serious note, the trend where development and admins now operate in a single blur, I find concerning. It may now be quite common for a front-end developer to also do all kinds of potentially disastrous admin/infra changes.

I think this is in particularly true with all the cloud stuff. 20 years ago, adding new servers to our DC would be a 6 month process. Now I can accidentally spin up 500 in 1 second.

  • > On a serious note, the trend where development and admins now operate in a single blur,

    It is I know enough about admin stuff to know that I don't know much of the endless amount of small but "for production" very important things which you can get away without on a local dev setup and which don't tell you they are wrong. So it will just not work grate and you have no idea why and might even think it's a problem of your software instead of your setup.

    Doing admin properly is as big of a job as any programming task and it's a very different field of expertise.

    And docker doesn't fix it, at all. It at best improves the illusion of you doing admin stuff correctly.

>That sinking feeling when you realize you've started something bad and can't stop it always gives me a visceral feeling like the world is doing a dolly zoom* around me.

The largest unit of time known to mankind, the "Ohnosecond".