Big Data: A Revolution That Will Transform How We Live, Work, and Think


4 Responses to “Big Data: A Revolution That Will Transform How We Live, Work, and Think”

  1. millie fink says:

    Phew, scary stuff about where we’re apparently inevitably going.

    Thanks for a wonderful review of this book.

  2. Luther Blissett says:

    Cory, this is a very interesting comment on the book. I don’t read it as a review, as it tells me less about what the book is about than what it’s not about.

    It seems it’s too long for the majority of bb’surfers. But whether or not there’s a wave of comments to ride on, I wanted to add some short musings.

    Disclosure: I am working on moderately sized datasets, which have given me terrible headaches during the last couple of years.

    Now, if you just search “big data” on any news aggregator (take google news, which gives you your “local” share of the stuff), I sense something odd.

    1st, BigData is sometimes discussed by applied statisticians and researchers as the new “DataMining”. It’s a buzz word for working with massive amounts of data, and to come up with some applied methods to detect and predict patterns. It is discussed that data mining never really held it’s promise to the stats community: creating some new, exiting opportunities – and jobs both for people especially qualified in data analysis, and new kinds of jobs as a result of these analyses.
    As far as I can see, speedy reaction to some (not so complicated) models makes money (HFT, anyone?). But maybe not much else. Or am I completely missing some important things here? The “other customers who bought “The Settlers of Catan” also bought titles by Wil Wheaton, Robert A. Heinlein and Cory Doctorow does NOT count. This is making money, not creating anything interesting, and especially not many jobs.  Now, why the buzz? What’s the news? Qui bono? Who profits, really? And don’t come up with facebook, last time I looked the ones getting rich weren’t the “shareholders” (I savour this term int this context, or should I say “Like”?)

    2nd, I am (and some others are, too) perceiving a dichotomy between the hype and the outcome of the analyses. Thesis: BigData is suffocating science instead of producing new insights. Antithesis: BigData is bringing our understanding of processes and patterns to new heights. Now, do figure: how many really insightful studies are coming out of data pools right now?

    The argument goes something like this: scientists used to collect data to test hypothesis (no matter if you were a bayesian or a frequentist). Nowadays, some of us are fishing in datapools for a meaning, but things like collinearity and the addition of noise by adding lots of variables make it hard to come up with a good explanation of something. We are already at our limits to understand the data we’ve got, but e.g. next gen sequencing produces data in a speed we can’t probably keep up with. The rate of error in the integrative approaches is expected to be really, really high.

    Compared to the TSA example: if you just keep integrating data to your database, your prediction if someone is a terrorist is not gaining accuracy. It’s just gaining precision. And if the interpretation of this precision is flawed – go figure. Same is true for gene functions, metabolic pathways, individual traits, patterns of distribution of individuals, (meta)communities, populations, species etc. (pro parte). Just because we can detect patterns doesn’t make these patterns interpretable, or even predictable. You can insert your favourite analogy here. One that I found funny is the positions of celestial bodies: you’ve got astronomy there. And you’ve got other interpretations of heavenly stuff, also using the zodiac, and coming to interesting conclusions.

    And this doesn’t even TOUCH the question of how the data was measured (specifications, and with which error margins…), and the question who controls access to the raw data, and who has the right to use it in which way, and how it stored and conserved for future use.

    Most importantly to science, it doesn’t touch the question how to keep modern research reproducible.

    This was also quite a sermon. I’m not expecting anybody to answer, but this really is an important issue to me. I could have posted this somewhere else in a blog, but at least here it will not be read by some techsavy nerdist whatever people, and not just not read by the usual innocent bysurfer.

    Oh. Just BTW: I’m planning to have a cursory look at the book as soon as my library of choice gets it. You made me curious, but not curious enough to buy it. ;)

    [edit: corrected some typos, plese kep teh rest.]

  3. marek says:

    It was interesting to read this on the same day as a fascinating post by Will Davies on the trend towards developing public policy, which includes:

    “The very character of Big Data is that it is collected with no particular purpose or theory in mind; it arises as a side-effect of other transactions and activities. It is, supposedly, ‘theory neutral’, like RCTs, indeed the techno-utopians have already argued that it can ultimately replace scientific theory altogether. Hence the facts that arise from big data are somehow self-validating; they simply emerge, thanks to mathematical analysis of patterns, meaning that there is no limit to the number of type of facts that can emerge. There’s almost no need for human cognition at all!

    “The problem with all of this, politically, is that causal theories and methodologies aren’t simply clumsy ways of interfering with empirical data, but also provide (wittingly or otherwise) standards of valuation. ”

    Well worth reading the whole thing –

  4. Luther Blissett says:

    Thanks, marek! That certainly was interesting, adding a very different perspective. Top-down, from the policies possibly inferred from data analysis. This is very though-provoking indeed. I’m coming from the opposite direction, from the side of data analysis. My opinion is that analysts might ask the wrong questions about what causes the patterns, and therefore get the whole explanation wrong. They might deduce based on their prior assumptions fed with a lot of (noisy?) data overfitting their models, or/and induce because of spurious correlations and collinearities.

    Sidenote, there’s one comment on the page about what I use to call the “perfect map analogy” – and it’s getting the key message of Davis text wrong. These analyses and models do work, and you can infer or induce generalised hypothesis from the data. Davis question is not if it’s workable, the question is, if it’s sensible. I admit he writes something in the direction of this:

    Without the extreme simplifications of rationalist theories, society would appear too complex to be governed at all. The empiricist response to the government’s paper title, ‘What Works’, might end up being ‘very little’, unless government becomes frighteningly ‘smart’.

    I think, ‘smart’ is misleading here. Gouvernement/companies/scientists already are ‘smart’ in away that they are collecting a lot of data, and we stated to integrate this to BigData. Now, the answer is no longer ‘very little’, but ‘with a p < 0.001, this works'. Without being reproducible (or, maybe in the case of politicians: understandable), and probably still without strong (realworld) pedictive value in a very complex system. It might be, I muse, that the perfect map analogy is more in my line of argument. Add so much complexity, and you find patterns everywhere, rendering predictions useless. But I have to think about this some more, I admit.

Leave a Reply