← Back to context

Comment by buildzr

6 years ago

> Then we moved on to restoring the WAF functionality. Because of the sensitivity of the situation we performed both negative tests (asking ourselves “was it really that particular change that caused the problem?”) and positive tests (verifying the rollback worked) in a single city using a subset of traffic after removing our paying customers’ traffic from that location.

Haha, so the free customers are crash test dummies for providing test traffic. Nice.

I actually don't mind that much, considering it's basically bulletproof DDoS protection for free. I'd much rather "be the product" in this way than in the way ad companies cause at least.

Seems fair. You have to roll it out to someone first, so why not roll it out to the users who are not paying for their service.

I used to work as a network engineer for awhile, now do web development. I worked with a number of cloud providers and you always have to roll out any fix carefully even if you're 100% sure (you're never 100% sure) that you've got the fix.

I honestly just assumed that when customer's chose where they would try things outside their lab, it was lower level customers, less busy part of the network, anywhere the impact isn't as serious. That's where the lowest risk is.

Some customers would discuss their own customer's by name as far as "Should we try this change on Customer Y?" And the discussion would work along those lines.

When I started deploying my own software, I just assumed anything that I was deploying to for free was a sort of "lab light" for them. I also don't mind, it seems fair.

ANY change outside a lab... is its own experiment.

  • Lowest risk, yes but not bulletproof.

    Smaller customers don't have the same web traffic, which may not be enough to trip any given failure scenario. One could imagine that the backtracking in an onerous regexep is only triggered with a sufficiently large customer that has a path that is especially difficult to match.

    With staged rollout and without a "fast" deploy procedure, by the time it hits the larger customers, it's already been deployed to some percentage of the fleet - and then you still have a problem, with a significant proportion of your fleet.

    Staged rollouts are an entirely reasonable risk mitigation idea, mind you, and not one I'm even arguing against.

    My point is that unfortunately it's no panacea, especially at scale. Which is what makes this all an experiment.

Or you can say all customers were affected but some localized free-tier customers got the fix first.

  • In this case yes, however they also indicate this is how they do their staged rollouts in general. So if they are releasing any other software update that goes through the staged rollout free customers are tested first. If that change broke something, free customers get that first. Which seems fair to me.

    • In my experience it’s generally best to roll out changes on testing, staging, and then clients in order of how much they pay, especially if you have SLAs with the highest paying customers.

      Impact is generally lower, both to the client, and to your bank account.

      4 replies →

  • If it's free, you're the product.

    Overall I think it's a good deal for both users and Cloudflare. Users get a major CDN for free, and instead of paying for it with ads, surveillance or other shady thing, they pay by being beta testers.

> I'd much rather "be the product" in this way than in the way ad companies cause at least

Your customers are the product. Cloudflare sets a first party tracking cookie on every domain they serve. They unwrap TLS and can see every product your customers look at or buy.

Whether intentionally or not, they built the Ad Network 2.0. They found the solution to ISPs not being able to snoop, and browsers locking down third party tracking.