← Back to context

Comment by throwup238

6 days ago

> In short, not many people want to funnel users through N code paths with slightly different behaviors, because not many people have a ton of users, a ton of engineering capacity, and a ton of potential upside from marginal improvements.

I’ve been in companies that have tried dozens if not hundreds of A/B tests with zero statistically significant results. I figure by the law of probabilities they would have gotten at least a single significant experiment but most products have such small user bases and make such large changes at a time that it’s completely pointless.

All my complaints fell on deaf ears until the PM in charge would get on someone’s bad side and then that metric would be used to push them out. I think they’re largely a political tool like all those management consultants that only come in to justify an executive’s predetermined goals.

> I’ve been in companies that have tried dozens if not hundreds of A/B tests with zero statistically significant results.

What I've seen in practice is that some places trust their designers' decisions and only deploy A/B tests when competent people disagree, or there's no clear, sound reason to choose one design over another. Surprise surprise, those alternatives almost always test very close to each other!

Other places remove virtually all friction from A/B testing and then use it religiously for every pixel in their product, and they get results, but often it's things like "we discovered that pink doesn't work as well as red for a warning button," stuff they never would have tried if they didn't have to feed the A/B machine.

From all the evidence I've seen in places I've worked, the motivating stories of "we increased revenue 10% by a random change nobody thought would help" may only exist in blog posts.

  • I think trusting your designers is probably the way to go for most teams. Good designers have solid intuitions and design principles for what will increase conversion rates. Many designers will still want a/b tests because they want to be able to justify their impact, but they should probably be denied. For really important projects designers should do small sample size research to validate their designs like we would do in the past.

    I think a/b tests are still good for measuring stuff like system performance, which can be really hard to predict. Flipping a switch to completely change how you do caching can be scary.

    • A/B tests for user interface is very annoying when you are on the phone trying to guide someone how to use a website. "Click the green button on the left" - "What do you mean? There is nothing green on the screen." - "Are you on xyz.com? Can you read out the adress to me please?" ... Oh so many hour wasted in tech support.

      1 reply →

    • Good designers generally optimize for taste but not for conversions. I have seen so many designs that were ugly as sin that won, as measured by testing. If you want to build a product that is tasteful, designers are the way to go. If you want to build a product optimized for a clear business metric like sales or upgrades or whatnot, experimentation works better.

      It just depends on the goals of the business.

  • In paid SaaS B2B A/B testing is usually a very good idea for use acquisition flow and onboarding, but not in the actual product per se.

    Once the user has committed to paying they probably will put up with whatever annoyance you put in their way, also if they are paying if something is _really_ annoying they often contact the SaaS people.

    Most SaaS don't really care that much about "engagement" metrics (ie keeping users IN the product). These are the kinda of metrics are are the easiest to see move.

    In fact most people want a product they can get in and out ASAP and move on with their lives.

    • Many SaaS companies care about engagement metrics, especially if they have to sell the product, like their revenue depends on salespeople convincing customers to renew or upgrade their licenses at a certain level for so many seats at $x/year.

      For example, I worked on a new feature for a product, and the engagement metrics showed a big increase in engagement by several customers' users, and showed that their users were not only using our software more but also doing their work much faster than before. We used that to justify raising our prices -- customers were satisfied with the product before, at the previous rates, and we could prove that we had just made it significantly more useful.

      I know of at least one case where we shared engagement data with a power user at a customer who didn't have purchase authority but was able to join it with their internal data to show that use of our software correlated with increased customer satisfaction scores. They took that data to their boss, who immediately bought more seats and scheduled user training for all of their workers who weren't using our software.

      We also used engagement data to convince customers not to cancel. A lot of times people don't know what's going on in their own company. They want to cancel because they think nobody is using the software, and it's important to be able to tell them how many daily and hourly users they have on average. You can also give them a list of the most active users and encourage them to reach out and ask what the software does for them and what the impact would be of cancelling.

> I’ve been in companies that have tried dozens if not hundreds of A/B tests with zero statistically significant results.

Well, at least it looks like they avoided p-hacking to show more significance than they had! That's ahead of much of science, alas.

> I’ve been in companies that have tried dozens if not hundreds of A/B tests with zero statistically significant results.

Yea, I've been here too. And in every analytics meeting everyone went "well, we know it's not statistically significant but we'll call it the winner anyway". Every. Single. Time.

Such a waste of resources.

  • Is it a waste? You proved the change wasn't harmful.

    • Statistically insignificant means you didn't prove anything by usual standards. I do agree that it's not a waste, as knowing that you have a 70% chance that you're going in the right direction is better than nothing. The 2 sigma crowd can be both too pessimistic and not pessimistic enough.

      1 reply →

    • You can still enshittify something by degrees this way.

      I think the disconnect here is some people thinking A/B testing is something you try once a month, and someplace like Amazon where you do it all the time and with hundreds of employees poking things.