Comment by theideaofcoffee

2 months ago

Same, my time at a F100 ecommerce retailer showed me the same. Every change control board justification needed an explicit back-out/restoration plan with exact steps to be taken, what was being monitored to ensure that was being held to, contacts of prominent groups anticipated to have an effect, emergency numbers/rooms for quick conferences if in fact something did happen.

The process was pretty tight, almost no revenue-affecting outages from what I can remember because it was such a collaborative effort (even though the board presentation seemed a bit spiky and confrontational at the time, everyone was working together).

6 comments

theideaofcoffee

prdonahue 2 months ago

And you moved at a glacial pace compared to Cloudflare. There are tradeoffs.

theideaofcoffee 2 months ago
Yes, of course, I want the organization that inserted itself into handling 20% of the world's internet traffic to move fast and break things. Like breaking the internet on a bi-weekly basis. Yep, great tradeoff there.
Give me a break.
- jimmydorry 2 months ago
  
  While you're taking your break, exploits gain traction in the wild and one of the value propositions for using a service provider like CloudFlare is catching and mitigating theses exploits as fast as possible. From the OP, this outage was in relation to handling a nasty RCE.
- wvenable 2 months ago
  
  But if your job is mitigate attacks/issues then things can very broken while you're being slow to mitigate it.
- JeremyNT 2 months ago
  
  Lest we forget, they initially rose to prominence by being cheaper than the existing solutions, not better, and I suppose this is a tradeoff a lot of their customers are willing to make.

lljk_kennedy 2 months ago

This sounds just as bad as yolo-merges, just on the other end of the spectrum.