← Back to context

Comment by wpietri

13 years ago

Maximizing a numerical reward signal is definitely not what we're doing when we do an A/B test.

We collect a variety of metrics. When we do an A/B test, we look at a all of them as a way of understanding what effect our change has on user behavior and long-term outcomes.

A particular change may be intended to effect just one metric, but that's in an all-else-equal way. It's not often the case that our changes affect only one metric. And that's great, because that gives us hints as to what our next test should be.

Well I guess you could be running a MANOVA or something to test over joint outcomes, but the AB test is over some sort of metric. I mean, when you set up an experiment, you need to have defined the dependent variable first. Now, after you have randomly split your treatment groups you can do post hock analysis, which I think is what you are referring to. But if you are optimizing, here needs to be some metric to optimize over. Of course at the end of the day the hypothesis test just tells you prob(data or greater| null=true) which I am not sure provides a direct path to decision making.