Lies, damned lies, and big data: the dictatorship of data

Kenneth Cukier and Viktor Mayer-Schönberger, co-authors of the excellent book Big Data write in the MIT Tech Review with a good, skeptical look at the risks of relying on data to the exclusion of other factors in decisionmaking. They use Robert McNamara, the hyper-rational architect of the Vietnam War, as their posterchild for data-blindness, and discuss how modern firms have repeated his mistakes in other domains:

The dictatorship of data ensnares even the best of them. Google runs everything according to data. That strategy has led to much of its success. But it also trips up the company from time to time. Its cofounders, Larry Page and Sergey Brin, long insisted on knowing all job candidates’ SAT scores and their grade point averages when they graduated from college. In their thinking, the first number measured potential and the second measured achievement. Accomplished managers in their 40s were hounded for the scores, to their outright bafflement. The company even continued to demand the numbers long after its internal studies showed no correlation between the scores and job performance.

Google ought to know better, to resist being seduced by data’s false charms. The measure leaves little room for change in a person’s life. It counts book smarts at the expense of knowledge. And it may not reflect the qualifications of people from the humanities, where know-how may be less quantifiable than in science and engineering. Google’s obsession with such data for HR purposes is especially queer considering that the company’s founders are products of Montessori schools, which emphasize learning, not grades. By Google’s standards, neither Bill Gates nor Mark Zuckerberg nor Steve Jobs would have been hired, since they lack college degrees.

The Dictatorship of Data (via O'Reilly Radar)

7

  1. I remember a hysterical (for geeks) bbq a couple of years ago where someone showed up with the “Google candidate questions” book.  The importance he put in it was funny, but every time he put one of the questions to the assembled carnivores, it was batted back to him within a minute.

    He was emotionally bruised and wide-eyed at the end.  No-one told him we’d all heard every single one of the questions in the book multiple times down at the pub. We were a bit mean.

    btw – interview with a senior Google people hiring person, knocking all of the questions, GPAs etc into the crowd:
    http://www.linkedin.com/today/post/article/20130620142512-35894743-on-gpas-and-brain-teasers-new-insights-from-google-on-recruiting-and-hiring?goback=%2Enmp_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1&trk=NUS_UNIU_PEOPLE_FOLLOW-megaphone-fllw

  2. Collecting it is one thing, using it without any kind of proof it matters is quite another. I might well ask for that and anything else I could legally ask for, then see how people worked out, and try to figure out what really mattered.

    1. There can be two problems with that approach, though.

      First, it pushes you toward looking at what can easily be collected, and if you do find some correlation can fool you into thinking that’s what mattered most. It is not unusual to find people stuck on what we all know are poor measures, simply because they’re most quantifiable.

      Second, when people do it on a large enough scale, Goodhart’s Law applies. It is important to remember when what you are measuring is just a stand-in, it can end up divorced from what it is supposed to represent.

  3. On the larger subject of Big Data – At the end of the day, companies are going to promote and reward the people who fake the p-values.  Maybe they will calculate real results, but then they will conveniently erase all the uncertainty that executives hate.  Or they’ll just pencil in the values management wants to see and then take a long lunch.   Who’s to know the difference?  Just bump the p-value’s decimal point one space to the left, and you are a god of data analysis. 

    Don’t forget, it was this sort of mumbo-jumbo that left the pharmaceutical industry high and dry with empty product pipelines. 

    Never lose sight of the fact that  it’s all about selling servers and software licenses. Nothing else matters.  The vendors  will make up whatever story they need to sell more servers and enterprise software  licenses. 

    1. And yet, by merely existing and expressing an opinion contrary to the “money at all costs” position, you yourself have proven that it’s not “all about selling servers and software licenses.”

      Sure, that’s a huge part of it! And yes, obviously many people and many cultural norms promote grossly unethical statistics and business practices. But not all of them! Some of them even work against the unethical and the unsavory. To say that Big Data or any other type of data is innately about unfettered sociopathic capitalism is to be no better than a Westboro Baptist fundamentalist, philosophically speaking.

  4. Accomplished managers in their 40s were hounded for the scores, to their outright bafflement. The company even continued to demand the numbers long after its internal studies showed no correlation between the scores and job performance.

    Google ought to know better, to resist being seduced by data’s false charms.

    I love this.  It just goes to show that you really need to look at the much bigger picture and use investigative critical thinking instead of simply sticking your nose in a book and spouting numbers.

    With Google’s stupid SAT & grade point average smell test, they wouldn’t have hired someone like Steve Jobs.

Comments are closed.