← Back to context

Comment by vlovich123

4 hours ago

> doesn't really justify the decision to use the same deployment system responsible for the 11/18 outage

There’s no other deployment system available. There’s a single system for config deployment and it’s all that was available as they haven’t yet done the progressive roll out implementation yet.

> There’s no other deployment system available.

Hindsight is always 20/20, but I don't know how that sort of oversight could happen in an organization whose business model rides on reliability. Small shops understand the importance of safeguards such as progressive deployments or one-box-style deployments with a baking period, so why not the likes of Cloudflare? Don't they have anyone on their payroll who warns about the risks of global deployments without safeguards?

Ok. Sure But shouldn't they have some beta/staging/test area they could deploy to, run tests for an hour then do the global blast?

  • Config changes are distinctly more difficult to have that set up for and as the blog says they’re working on it. They just don’t have it ready yet and are pausing any more config changes until it’s set up. They just did this one in response to try to mitigate an ongoing security vulnerability and missed the mark.

    I’m happy to see they’re changing their systems to fail open which is one of the things I mentioned in the conversation about their last outage.