Features Podcasts Family Video Comics Music Tech Science Books Film & TV Games ✚

Jill

A/B testing: the secret engine of creation and refinement for the 21st century

Cory Doctorow at 3:33 am Thu, Apr 26, 2012

— FEATURED —

Book Review

The Man Who Laughs: grotesque Victor Hugo potboiler was the basis for The Joker

Feature

Eurovision 2013: An American in London

Book Review

The Twelve-Fingered Boy - mesmerizing YA horror novel

— FOLLOW US —

Boing Boing is on Twitter and Facebook. Subscribe to our RSS feed or daily email.

 

— POLICIES —

Except where indicated, Boing Boing is licensed under a Creative Commons License permitting non-commercial sharing with attribution

 

— FONTS —

Tweet
Kindle

Brian Christian's long Wired feature on A/B testing does a good job of explaining the quiet revolution in product design we've experienced this century, the modes of thought that habitual A/B testing encourages, and the drawbacks to those modes. A lot of the products and services we use today are designed to a turn that makes the previous technologies look like stone axes. That's largely thanks to the ability to run multivariate tests on vast sets of diverse design-choices and quickly converge on optimal solutions that are continuously and automatically refined.

For that same reason, A/B increasingly makes meetings irrelevant. Where editors at a news site, for example, might have sat around a table for 15 minutes trying to decide on the best phrasing for an important headline, they can simply run all the proposed headlines and let the testing decide. Consensus, even democracy, has been replaced by pluralism—resolved by data...

Google insiders, and A/B enthusiasts more generally, have a derisive term to describe a decisionmaking system that fails to put data at its heart: HiPPO—”highest-paid person’s opinion.” As Google analytics expert Avinash Kaushik declares, “Most websites suck because HiPPOs create them...”

One consequence of this data-driven revolution is that the whole attitude toward writing software, or even imagining it, becomes subtly constrained. A number of developers told me that A/B has probably reduced the number of big, dramatic changes to their products. They now think of wholesale revisions as simply too risky—instead, they want to break every idea up into smaller pieces, with each piece tested and then gradually, tentatively phased into the traffic.

The A/B Test: Inside the Technology That’s Changing the Rules of Business

I write books. My latest is a YA science fiction novel called Homeland (it's the sequel to Little Brother). More books: Rapture of the Nerds (a novel, with Charlie Stross); With a Little Help (short stories); and The Great Big Beautiful Tomorrow (novella and nonfic). I speak all over the place and I tweet and tumble, too.

MORE:  Business • design • marketing • web theory

More at Boing Boing

Eurovision 2013: An American in London

The technology that links taxonomy and Star Trek

  • http://www.nathanhornby.com/ Nathan Hornby

    “One consequence of this data-driven revolution is that the whole attitude toward writing software, or even imagining it, becomes subtly constrained. A number of developers told me that A/B has probably reduced the number of big, dramatic changes to their products. They now think of wholesale revisions as simply too risky—instead, they want to break every idea up into smaller pieces, with each piece tested and then gradually, tentatively phased into the traffic.”

    If you’ve ever had to roll out a new product to a customer then you’ll be the first to agree that ‘restrained’ is always the best policy when it comes to design.  Known as evolution over revolution.  There are exceptions of course; but interaction design (where this applies most) consistently demands subtle iteration.

    But the amount of change is also defined by the user base of the product.  If you’re redesigning a site that only gets a couple of hundred visits a day then it’ll take years before you can get any meaningful data out of multivariate testing, so the changes need to be larger.

    Also it’s worth noting that multivariate and A/B testing aren’t synonymous terms; A/B testing is large-scale changes, i.e. presenting two different layouts to two different customers.  Whereas multivariate involves subtle micro-changes (a headline wording for example).  They have different uses and are used in different contexts.

  • suburbanhick

    This just screams “lowest common denominator” to me. Much like focus group testing, this kind of thing just produces and enforces mediocrity. Call me elitist, but I would sooner have one or more people who really know what they’re doing – based on years of proven experience, not just some chairwarmer who’s only in the position because of seniority/who they’re sleeping with/etc. – making certain decisions than a piece of software. As a designer with 25+ years experience, I know from experience that design by committee/consensus is usually a watered-down, wishy-washy disaster. An individual would be more likely to take a risk on something crazy that just MIGHT work than a machine that is PROGRAMMED to only go with a sure thing. Sounds to me like a lot of the people being snarky about HiPPOS are just jealous because they’re not the highest-paid person.

    • http://artdonovan.typepad.com Art

       Accurately observed!

    • dragonfrog

      Funy, I’d describe it as the polar opposite of focus-group testing.

      Focus groups bring a self-selecting sample of people into an artificial environment and ask them to choose between options.  It’s not a fair cross-section of actual users, it’s based on their possibly erroneous self-assessment of their behaviour, and it risks defaulting to the opinion of the pushiest person in the room.

      The testing described in the article puts the options into the real environment, and observes what they do.  It uses a truly randomly selected set of users, is based on their actual behaviour, and no one test subject’s actions influence any other’s.

      Think of political polls – they can’t describe what voters will do, so as a proxy for that, they
      describe what people with land-line phones and a tendency to talk to
      pollsters claim to be going to do.  Now imagine if you could correlate every voter’s actual choices on election day, to the political messages they had seen that day, and constantly update your advertising as you learn what’s actually producing votes for your party.  Not the same thing at all…

  • http://twitter.com/taro3yen taro3yen

    Ah, come folks—Define the buzz term ”A/B Testing” in the lead paragraph. Sheesh.

    • http://www.kmoser.com kmoser

       You must have seen the version of the article that didn’t include the definition. :)

  • puppybeard

    The thing about A/B testing is that you can end up with a product nobody wants, which is often what happens when you try and please everybody.

    Let’s say you do 10 rounds of A/B testing and every time, one iteration wins by 51%. Congratulations, you’ve made the perfect product. For 0.12% of your target market. (100% * 0.51^10)  For everyone else, results will vary. Whoop-de-doo.

    Jeff Atwood of Coding Horror did a great take on it a few years ago: http://www.codinghorror.com/blog/2010/07/groundhog-day-or-the-problem-with-ab-testing.html

    Personally, I think most research should happen before you create something, those results should be considered by experts, and then something should be built.

    Insight and experience have always been far superior to data. You know where people allow their decisions to be ruled by data? The financial markets. A disaster.

    As for google staff sneering decisions that aren’t driven by data, well, what was their last great decision? Google+ was a disaster. Google drive looks ok, but talk about being late to the party. Just like Google+, people already have services taking care of that need. Dunning-Kruger case studies, the lot of them.

    • Mantissa128

      Actually, Google+ seemed like exactly the decision a HiPPO would make. It is completely uncharacteristic of the company to offer something nobody wanted, and seems like a shift away from what made them a success. They don’t do the 20% personal project time anymore, and it’s turning into a standard, run-of-the-mill corporation… so sad.

      And I don’t get the hate on in this thread for data-driven decisions, or the mindless support for ‘experts.’ Who do you think those experts are? The highest-paid people in the room.

      We have oodles of experts. Buckets of them. But they are human beings, subject to confirmation bias along with everything else that causes our decision making to go off-track. If experts and the people who followed their advice were wildly successful, we’d live in a technocracy now.

      • puppybeard

        I’d say you’re half right about G+, it was hardly the result of A/B testing, more about competing with Facebook for users’ time.

        The ‘experts’ are the same people who decided what data will be analysed, so perhaps they can never be removed from the equation.

        The clue to why A/B testing isn’t respected is it’s orgin, in marketing.
        Marketing has a high failure rate, so agencies need to data to justify as many decisions as possible.
        When a campaign fails, and most campaigns do fail, if an agency can say they did their homework and point to some data,
        they have a better chance of retaining the client, compared to if they simply said “we thought it was a good idea”.
        The net result is the same, but they have an excuse.

        As for Google, maybe the faith in A/B testing is symptomatic of how innovation is grinding to a halt.
        People get paid a lot to work their, but most projects seem to fail these days.
        If they can say “the numbers made me do it” maybe they can cover their arses and keep their jobs.
        Maybe it’s that kind of risk-aversion, the refusal to say, “I know what I’m doing”, that stops them from being so-called HiPPOs?

        I’m a prototype developer, and in my current job we find the best way to make something people want is to ask them what they want. Imagine that!
        Get masses of meaningful data, opinions and perspectives, rather than “100,000 people clicked on the pink button”, sift through it, and incorporate it into your design documents.
        Make something good, send it out, engage with the users, invite feedback, sift through it, and iterate.

        I wouldn’t rule out A/B testing for the odd component, but I don’t believe in treating customers like lab rats.

  • EvilSpirit

    See also: Hill Climbing http://en.wikipedia.org/wiki/Hill_climbing

    Useful, but hardly without its problems.

  • http://www.facebook.com/profile.php?id=1375676891 Tim Downing

    Unfortunately this is what has driven food for the past 20-30 years, and given us the current over-salted, over-sugared garbage that fills supermarkets today.  In A/B tests saltier almost always wins, sweeter always wins.  After a few generations of A/B, then B/C testing, then C/D what happens is that the taste of D is very far removed from A.
    Humans are led down this path year over year as the changes are so subtle, but if people did an A/D test they’d reject it as not tasting like food.

  • http://www.jimdraws.com Thorzdad

    “A lot of the products and services we use today are designed to a turn that makes the previous technologies look like stone axes. That’s largely thanks to the ability to run multivariate tests on vast sets of diverse design-choices and quickly converge on optimal solutions that are continuously and automatically refined.”

    Or thanks to intuitive leaps taken by creative and insightful designers. A/B testing would probably not have resulted in the iPhone. It required a creative (and human) leap of intuition on the part of the developers and designers to simply reject the dominant (at the time) smartphone paradigm.

    As for Google being praised for being data-driven…All I can say is “41 Shades of Blue”
    http://stopdesign.com/archive/2009/03/20/goodbye-google.html

  • atimoshenko

    Has anyone A/B tested the efficacy of A/B testing against the work of single insightful designer?

    Seems to me that A/B testing would be good for refinement (and verification), but not nearly as good for speculative redefinition.

  • Jeremy Mesiano-Crookston

    Where editors at a news site, for example, might have sat around a table for 15 minutes trying to decide on the best phrasing for an important headline, they can simply run all the proposed headlines and let the testing decide. Consensus, even democracy, has been replaced by pluralism—resolved by data.

    This is a pretty monstrous idea. Just thought I’d point that out.

  • http://www.facebook.com/people/David-Witt/1041651388 David Witt

    Who needs Jonny Ive? We’ve got A/B testing!

  • Walter Reade

    A/B testing also goes the other way – companies use it to justify taking features or quality out of a product. That’s why, e.g., you go through toilet paper rolls faster than 5 years ago. They A/B test a product with less sheets, or a narrow diameter, or less thickness, and “statistically” it fares no worse than the control. The company gets it’s cost savings. The consumer loses.  

    http://schoolofgoodenough.blogspot.com/2010/05/incredible-shrinking-toilet-paper.html 

  • http://www.kmoser.com kmoser

    Who decides which variations to test? The HiPPO!

  • EvilSpirit

    Seems to me that if Google really believed in A/B testing, they’d just hire everybody who applied, and then see who works out.

  • ian_b

    A-B testing is not meant to grind a site into mediocrity.

    In an e-commerce environment where merchandisers, engineers/programmers, designers, product developers, and photographers are all trying to make the site better, it can be difficult to pin down who deserves credit, what’s working, and what isn’t. Especially if each team pushes 5 new changes into a release. A-B testing everything lets everyone focus on the impact they make within their discipline.

    You don’t always test money. You might test return visitors, new registrations, social media activity, or anything that matters in the context of the test. Testing solely on revenue can be very detrimental to a brand.

  • Bill Reals

    We do lots of A/B tests on our site and inside of our product here at work. They work but what can sometimes get lost is what about them worked. You’ll have data that says, “It raised conversion by 1%” but you have to then infer the customer behavior. You have to test small, individual changes or you can’t really measure them. Also, A/B testing for websites or product can sometimes turn into what we call the “Frankenstein” where the small incremental changes seem to add up and one day you wake up and it seems to have lost it’s focus.

  • http://grumer.org/ Avram Grumer

    The though that kept recurring to me as I read the article: What successes can Google point to that derive from this approach? I’m talking about something inside Google, not the anecdote about the Obama campaign’s website. 

    Back in the ’90s, Google’s original search page was often praised for its uncluttered minimalism. That’s the last time I can recall any Google product being praised for the quality of its design.