Comment by duxup
6 years ago
I used to work as a network engineer for awhile, now do web development. I worked with a number of cloud providers and you always have to roll out any fix carefully even if you're 100% sure (you're never 100% sure) that you've got the fix.
I honestly just assumed that when customer's chose where they would try things outside their lab, it was lower level customers, less busy part of the network, anywhere the impact isn't as serious. That's where the lowest risk is.
Some customers would discuss their own customer's by name as far as "Should we try this change on Customer Y?" And the discussion would work along those lines.
When I started deploying my own software, I just assumed anything that I was deploying to for free was a sort of "lab light" for them. I also don't mind, it seems fair.
ANY change outside a lab... is its own experiment.
Lowest risk, yes but not bulletproof.
Smaller customers don't have the same web traffic, which may not be enough to trip any given failure scenario. One could imagine that the backtracking in an onerous regexep is only triggered with a sufficiently large customer that has a path that is especially difficult to match.
With staged rollout and without a "fast" deploy procedure, by the time it hits the larger customers, it's already been deployed to some percentage of the fleet - and then you still have a problem, with a significant proportion of your fleet.
Staged rollouts are an entirely reasonable risk mitigation idea, mind you, and not one I'm even arguing against.
My point is that unfortunately it's no panacea, especially at scale. Which is what makes this all an experiment.