A/B testing: the secret engine of creation and refinement for the 21st century


21 Responses to “A/B testing: the secret engine of creation and refinement for the 21st century”

  1. “One consequence of this data-driven revolution is that the whole attitude toward writing software, or even imagining it, becomes subtly constrained. A number of developers told me that A/B has probably reduced the number of big, dramatic changes to their products. They now think of wholesale revisions as simply too risky—instead, they want to break every idea up into smaller pieces, with each piece tested and then gradually, tentatively phased into the traffic.”

    If you’ve ever had to roll out a new product to a customer then you’ll be the first to agree that ‘restrained’ is always the best policy when it comes to design.  Known as evolution over revolution.  There are exceptions of course; but interaction design (where this applies most) consistently demands subtle iteration.

    But the amount of change is also defined by the user base of the product.  If you’re redesigning a site that only gets a couple of hundred visits a day then it’ll take years before you can get any meaningful data out of multivariate testing, so the changes need to be larger.

    Also it’s worth noting that multivariate and A/B testing aren’t synonymous terms; A/B testing is large-scale changes, i.e. presenting two different layouts to two different customers.  Whereas multivariate involves subtle micro-changes (a headline wording for example).  They have different uses and are used in different contexts.

  2. suburbanhick says:

    This just screams “lowest common denominator” to me. Much like focus group testing, this kind of thing just produces and enforces mediocrity. Call me elitist, but I would sooner have one or more people who really know what they’re doing – based on years of proven experience, not just some chairwarmer who’s only in the position because of seniority/who they’re sleeping with/etc. – making certain decisions than a piece of software. As a designer with 25+ years experience, I know from experience that design by committee/consensus is usually a watered-down, wishy-washy disaster. An individual would be more likely to take a risk on something crazy that just MIGHT work than a machine that is PROGRAMMED to only go with a sure thing. Sounds to me like a lot of the people being snarky about HiPPOS are just jealous because they’re not the highest-paid person.

    • Art says:

       Accurately observed!

    • dragonfrog says:

      Funy, I’d describe it as the polar opposite of focus-group testing.

      Focus groups bring a self-selecting sample of people into an artificial environment and ask them to choose between options.  It’s not a fair cross-section of actual users, it’s based on their possibly erroneous self-assessment of their behaviour, and it risks defaulting to the opinion of the pushiest person in the room.

      The testing described in the article puts the options into the real environment, and observes what they do.  It uses a truly randomly selected set of users, is based on their actual behaviour, and no one test subject’s actions influence any other’s.

      Think of political polls – they can’t describe what voters will do, so as a proxy for that, they
      describe what people with land-line phones and a tendency to talk to
      pollsters claim to be going to do.  Now imagine if you could correlate every voter’s actual choices on election day, to the political messages they had seen that day, and constantly update your advertising as you learn what’s actually producing votes for your party.  Not the same thing at all…

  3. taro3yen says:

    Ah, come folks—Define the buzz term “A/B Testing” in the lead paragraph. Sheesh.

  4. puppybeard says:

    The thing about A/B testing is that you can end up with a product nobody wants, which is often what happens when you try and please everybody.

    Let’s say you do 10 rounds of A/B testing and every time, one iteration wins by 51%. Congratulations, you’ve made the perfect product. For 0.12% of your target market. (100% * 0.51^10)  For everyone else, results will vary. Whoop-de-doo.

    Jeff Atwood of Coding Horror did a great take on it a few years ago: http://www.codinghorror.com/blog/2010/07/groundhog-day-or-the-problem-with-ab-testing.html

    Personally, I think most research should happen before you create something, those results should be considered by experts, and then something should be built.

    Insight and experience have always been far superior to data. You know where people allow their decisions to be ruled by data? The financial markets. A disaster.

    As for google staff sneering decisions that aren’t driven by data, well, what was their last great decision? Google+ was a disaster. Google drive looks ok, but talk about being late to the party. Just like Google+, people already have services taking care of that need. Dunning-Kruger case studies, the lot of them.

    • Mantissa128 says:

      Actually, Google+ seemed like exactly the decision a HiPPO would make. It is completely uncharacteristic of the company to offer something nobody wanted, and seems like a shift away from what made them a success. They don’t do the 20% personal project time anymore, and it’s turning into a standard, run-of-the-mill corporation… so sad.

      And I don’t get the hate on in this thread for data-driven decisions, or the mindless support for ‘experts.’ Who do you think those experts are? The highest-paid people in the room.

      We have oodles of experts. Buckets of them. But they are human beings, subject to confirmation bias along with everything else that causes our decision making to go off-track. If experts and the people who followed their advice were wildly successful, we’d live in a technocracy now.

      • puppybeard says:

        I’d say you’re half right about G+, it was hardly the result of A/B testing, more about competing with Facebook for users’ time.

        The ‘experts’ are the same people who decided what data will be analysed, so perhaps they can never be removed from the equation.

        The clue to why A/B testing isn’t respected is it’s orgin, in marketing.
        Marketing has a high failure rate, so agencies need to data to justify as many decisions as possible.
        When a campaign fails, and most campaigns do fail, if an agency can say they did their homework and point to some data,
        they have a better chance of retaining the client, compared to if they simply said “we thought it was a good idea”.
        The net result is the same, but they have an excuse.

        As for Google, maybe the faith in A/B testing is symptomatic of how innovation is grinding to a halt.
        People get paid a lot to work their, but most projects seem to fail these days.
        If they can say “the numbers made me do it” maybe they can cover their arses and keep their jobs.
        Maybe it’s that kind of risk-aversion, the refusal to say, “I know what I’m doing”, that stops them from being so-called HiPPOs?

        I’m a prototype developer, and in my current job we find the best way to make something people want is to ask them what they want. Imagine that!
        Get masses of meaningful data, opinions and perspectives, rather than “100,000 people clicked on the pink button”, sift through it, and incorporate it into your design documents.
        Make something good, send it out, engage with the users, invite feedback, sift through it, and iterate.

        I wouldn’t rule out A/B testing for the odd component, but I don’t believe in treating customers like lab rats.

  5. EvilSpirit says:

    See also: Hill Climbing http://en.wikipedia.org/wiki/Hill_climbing

    Useful, but hardly without its problems.

  6. Tim Downing says:

    Unfortunately this is what has driven food for the past 20-30 years, and given us the current over-salted, over-sugared garbage that fills supermarkets today.  In A/B tests saltier almost always wins, sweeter always wins.  After a few generations of A/B, then B/C testing, then C/D what happens is that the taste of D is very far removed from A.
    Humans are led down this path year over year as the changes are so subtle, but if people did an A/D test they’d reject it as not tasting like food.

  7. Thorzdad says:

    “A lot of the products and services we use today are designed to a turn that makes the previous technologies look like stone axes. That’s largely thanks to the ability to run multivariate tests on vast sets of diverse design-choices and quickly converge on optimal solutions that are continuously and automatically refined.”

    Or thanks to intuitive leaps taken by creative and insightful designers. A/B testing would probably not have resulted in the iPhone. It required a creative (and human) leap of intuition on the part of the developers and designers to simply reject the dominant (at the time) smartphone paradigm.

    As for Google being praised for being data-driven…All I can say is “41 Shades of Blue”

  8. atimoshenko says:

    Has anyone A/B tested the efficacy of A/B testing against the work of single insightful designer?

    Seems to me that A/B testing would be good for refinement (and verification), but not nearly as good for speculative redefinition.

  9. Jeremy Mesiano-Crookston says:

    Where editors at a news site, for example, might have sat around a table for 15 minutes trying to decide on the best phrasing for an important headline, they can simply run all the proposed headlines and let the testing decide. Consensus, even democracy, has been replaced by pluralism—resolved by data.

    This is a pretty monstrous idea. Just thought I’d point that out.

  10. David Witt says:

    Who needs Jonny Ive? We’ve got A/B testing!

  11. Walter Reade says:

    A/B testing also goes the other way – companies use it to justify taking features or quality out of a product. That’s why, e.g., you go through toilet paper rolls faster than 5 years ago. They A/B test a product with less sheets, or a narrow diameter, or less thickness, and “statistically” it fares no worse than the control. The company gets it’s cost savings. The consumer loses.  


  12. kmoser says:

    Who decides which variations to test? The HiPPO!

  13. EvilSpirit says:

    Seems to me that if Google really believed in A/B testing, they’d just hire everybody who applied, and then see who works out.

  14. ian_b says:

    A-B testing is not meant to grind a site into mediocrity.

    In an e-commerce environment where merchandisers, engineers/programmers, designers, product developers, and photographers are all trying to make the site better, it can be difficult to pin down who deserves credit, what’s working, and what isn’t. Especially if each team pushes 5 new changes into a release. A-B testing everything lets everyone focus on the impact they make within their discipline.

    You don’t always test money. You might test return visitors, new registrations, social media activity, or anything that matters in the context of the test. Testing solely on revenue can be very detrimental to a brand.

  15. Bill Reals says:

    We do lots of A/B tests on our site and inside of our product here at work. They work but what can sometimes get lost is what about them worked. You’ll have data that says, “It raised conversion by 1%” but you have to then infer the customer behavior. You have to test small, individual changes or you can’t really measure them. Also, A/B testing for websites or product can sometimes turn into what we call the “Frankenstein” where the small incremental changes seem to add up and one day you wake up and it seems to have lost it’s focus.

  16. Avram Grumer says:

    The though that kept recurring to me as I read the article: What successes can Google point to that derive from this approach? I’m talking about something inside Google, not the anecdote about the Obama campaign’s website. 

    Back in the ’90s, Google’s original search page was often praised for its uncluttered minimalism. That’s the last time I can recall any Google product being praised for the quality of its design. 

Leave a Reply