Understanding spurious correlation in data-mining


Last May, Dave at Euri.ca took at crack at expanding Gabriel Rossman's excellent post on spurious correlation in data. It's an important read for anyone wondering whether the core hypothesis of the Big Data movement is that every sufficiently large pile of horseshit must have a pony in it somewhere. As O'Reilly's Nat Torkington says, "Anyone who thinks it’s possible to draw truthful conclusions from data analysis without really learning statistics needs to read this."

Read the rest

Young brothers explain Bayes's theorem

These two young fellows are brothers from Palo Alto who've set out to produce a series of videos explaining the technical ideas in my novel Little Brother, and their first installment, explaining Bayes's Theorem, is a very promising start. I'm honored -- and delighted!

Technology behind "Little Brother" - Jamming with Bayes Rule

Statistics Done Wrong: a guide to spotting and avoiding stats errors


Alex Reinhart's Statistics Done Wrong: The woefully complete guide is an important reference guide, right up there with classics like How to Lie With Statistics. The author has kindly published the whole text free online under a CC-BY license, with an index. It's intended for people with no stats background and is extremely readable and well-presented. The author says he's working on a new edition with new material on statistical modelling.

Read the rest

A visit to the Indian temple where "0" was invented

The BBC's Alex Bellos travels to Gwalior, an Indian city that contains a temple with the oldest known use of the number "0". It's part of an effort to figure out why zero would appear in India, and not in other, earlier civilizations that were mathematically adept. From Bellos' perspective, part of the answer might lie in theology — a mathematical representation of the mystical idea of "nothingness".

American education's use of "value added measures" is statistically bankrupt

American teachers are widely assessed on the basis of "value added measures," a statistical tool for analyzing the outcomes of their teaching. But as Jerry Genovese points out, this is statistically completely bankrupt -- unless you randomize your samples, you get no insight into the quality of the teaching. I asked my father, Gord Doctorow -- a mathematician, math teacher, and professor of education -- what he thought of Genovese's piece, and he sent me some great material, which you'll find after the jump.

Read the rest

Statistics explained with the help of modern dance

If you're the type of person who really needs some good visuals to make a concept stick in your head, this series of YouTube videos made by the British Psychological Society Media Centre will help you remember the meanings behind statistical concepts like "correlation", "frequency distributions", and "sampling error". There are four videos in the series so far, and they do a great job of painting pictures around abstract ideas. Bonus: Soothing music.

Via Openculture

90 percent of Tor keys can be broken by NSA: what does it mean?

Errata Security CEO Rob Graham has published a blog-post speculating that ninety percent of the traffic on the Tor anonymized network can be broken by the NSA. That's because the majority of Tor users are still on the an old version of the software, 2.3, which uses 1024 RSA/DH keys -- and at keylengths of 1024 RSA/DH crypto can be broken in a matter of hours using custom chips fabbed at an estimated cost of $1B. It seems likely that the NSA has spent the necessary sum and sourced these chips (likely from IBM).

This isn't the same as being able to decrypt all of Tor in realtime, but it does suggest that the NSA could selectively decrypt its stored archives of Tor traffic.

However, the new version of Tor, 2.4, uses elliptical curve Diffie-Hellman ciphers, which are probably beyond the NSA's reach.

Graham faults the Tor Project for the poor uptake of its new version, though as an Ars Technica commenter points out, popular GNU/Linux distributions like Debian and its derivative Ubuntu are also to blame, since they only distribute the older, weaker version. In either event, this is a wake-up call that will likely spur both the Tor Project and the major distros to push the update.

Yesterday's revelations about the NSA's ability to decrypt 'secure' communications were taken by many to mean that the NSA had made fundamental mathematical or computing breakthroughs that allowed it to decrypt securely enciphered messages. But it's pretty clear that's not what's going on.

Read the rest

NSA probably hasn't broken strong crypto


You may have heard speculation that the NSA has secretly broken the strong cryptographic systems used to keep data secret -- after all, why collect all that scrambled data if they can't unscramble it? But Bruce Schneier argues (convincingly) that this is so impossible as to be fanciful. So why have they done this? My guess is that they're counting on flaws being revealed in the cryptographic implementations in the field (or maybe they've discovered such flaws and are keeping them secret). Or they're hoping for a big breakthrough in the future (quantum computing, anyone?).

Read the rest

Great moments in pedantry: Double Stuf Oreos not actually double stuffed

In fact, the Double Stuf Oreos tested by a high school math class in Queensbury, N.Y. contained only 1.86x the amount of stuff that was in a regular Oreo. A Nabisco spokeswoman, responding to the scandal, says the measurements must have been inaccurate.

Some real math on the real risk of shark attacks


Great white shark. © Oceana/David Stephens.

Shark attack stats: "The real threat is humans. For every one human killed by a shark, there are approximately 25 million sharks killed by humans."

About 200 million people go to U.S. beaches each year. About 36 of those hundreds of millions are attacked by sharks. Most of them survive. In contrast, more than 30,000 of those millions of beach-goers are to be rescued from surfing accidents. And many of those humans each year die, or must be rescued, from drowning incidents in which no other creature is to blame.

So, will we see Human Week, or Human-nado mockumentaries any time soon?

[Oceana.org]

At VW's request, English court censors Usenix Security presentation on keyless entry systems for luxury cars


Flavio Garcia, a security researcher from the University of Birmingham has been ordered not to deliver an important paper at the Usenix Security conference by an English court. Garcia, along with colleagues from a Dutch university, had authored a paper showing the security failings of the keyless entry systems used by a variety of luxury cars. Volkswagon asked an English court for an injunction censoring his work -- which demonstrated their incompetence and the risk they'd exposed their customers to -- and Mr Justice Birss agreed.

Read the rest

Creativity, math, and 12-tone music

We've featured doodling, fast-talking YouTube mathematician Vi Hart a lot here, but her latest, a 30-minute extended mix, is absolutely remarkable, even by her high standards. For 30 glorious minutes, Ms Hart explores the nature of randomness and pattern, using Stravinsky's 12-tone music as a starting-point and rocketing through constellations, the nature of reality, Borges's library, and more. On the way, she ends up with a good working definition of creativity, and explores the dilemma of structure versus creation. Brava, Ms Hart, you have outdone yourself! Plus, I like your copyright jokes.

Twelve Tones

Tic-Tac-Toe squared


Want to play a game of Tic-Tac-Toe that's genuinely challenging and hard? Try "Ultimate Tic-Tac-Toe," in which each square is made up of another, smaller Tic-Tac-Toe board, and to win the square you have to win its mini-game. Ben Orlin says he discovered the game on a mathematicians' picnic, and he explains a wrinkle on the rules:

You don’t get to pick which of the nine boards to play on. That’s determined by your opponent’s previous move. Whichever square he picks, that’s the board you must play in next. (And whichever square you pick will determine which board he plays on next.)...

This lends the game a strategic element. You can’t just focus on the little board. You’ve got to consider where your move will send your opponent, and where his next move will send you, and so on.

The resulting scenarios look bizarre. Players seem to move randomly, missing easy two- and three-in-a-rows. But there’s a method to the madness – they’re thinking ahead to future moves, wary of setting up their opponent on prime real estate. It is, in short, vastly more interesting than regular tic-tac-toe.

Ultimate Tic-Tac-Toe (via Kottke)

Math textbook attempts to solve relationship drama

The correct answer is that Brian and Angela just need to break up, already.

From Thanks, Textbooks — a fantastic Tumblr of supremely weird and hilarious textbook examples and questions.

Symmetry and sound

This fantastic video by Vi Hart shows you what the math of music looks like in a visual representation — or, should that be "what visual frieze patterns sound like when turned into music"?

Frieze patterns are symmetrical repeating patterns that show up in architecture, art, and even our model of DNA. According to Hart, this video is:

A visual and musical expression of mathematical symmetry groups. The transformations done to the video are equivalent to the transformations done to the notes.

Very cool to watch! Here's the video link.

Thanks, Peter Newbury!