Comment by baxtr

6 days ago

I wouldn’t call that A/B testing but rather a gradual roll-out.

If you roll it back upon seeing problems, then you're doing something meaningful, at least. IMO 90+% of the value of A/B testing comes from two things, a) forcing engineers to build everything behind flags, and b) making sure features don't crater your metrics before freezing them in and making them much more difficult to remove (both politically and technically).

Re: b), if you've ever gotten into a screaming match with a game designer angry over the removal of their pet feature, you will really appreciate the political cover that having numbers provides...

I think parent is confusing A/B testing with feature flags, which can be used for A/B tests but also for roll-outs.

  • Not the parent but some actual practitioners. A change is based on the gut feeling, and it's usually correct, but the internal politics require to demonstrate impartiality, so an "A/B test" is run, to show that the change is "objectively better", whether statistics show that or not.

  • Feature flags tend to be all or nothing and/or A/B testing instrumentation can be used to roll out feature flags.

    It’s complicated.

  • I’m aware of the distinction. A/B testing is the killer app for feature flags from the perspective of business decision makers.

I think gradual rollout can use the same mechanism, but for a different readon: avoiding pushing out a potentially buggy product to all users in one sweep.

It becomes an A/B test when you measure user activity to decide whether to roll out to more users.

  • Has my CPU use gone up? No.

    Have my error logs gotten bigger? No.

    Have my tech support calls gone up? No.

    Okay then turn the dial farther.