Comment by theideaofcoffee

3 hours ago

Same, my time at a F100 ecommerce retailer showed me the same. Every change control board justification needed an explicit back-out/restoration plan with exact steps to be taken, what was being monitored to ensure that was being held to, contacts of prominent groups anticipated to have an effect, emergency numbers/rooms for quick conferences if in fact something did happen.

The process was pretty tight, almost no revenue-affecting outages from what I can remember because it was such a collaborative effort (even though the board presentation seemed a bit spiky and confrontational at the time, everyone was working together).

And you moved at a glacial pace compared to Cloudflare. There are tradeoffs.

  • Yes, of course, I want the organization that inserted itself into handling 20% of the world's internet traffic to move fast and break things. Like breaking the internet on a bi-weekly basis. Yep, great tradeoff there.

    Give me a break.

    • While you're taking your break, exploits gain traction in the wild and one of the value propositions for using a service provider like CloudFlare is catching and mitigating theses exploits as fast as possible. From the OP, this outage was in relation to handling a nasty RCE.

    • Lest we forget, they initially rose to prominence by being cheaper than the existing solutions, not better, and I suppose this is a tradeoff a lot of their customers are willing to make.

    • But if your job is mitigate attacks/issues then things can very broken while you're being slow to mitigate it.