# Understanding spurious correlation in data-mining

Last May, Dave at Euri.ca took at crack at expanding Gabriel Rossman's excellent post on spurious correlation in data. It's an important read for anyone wondering whether the core hypothesis of the Big Data movement is that every sufficiently large pile of horseshit must have a pony in it somewhere. As O'Reilly's Nat Torkington says, "Anyone who thinks it’s possible to draw truthful conclusions from data analysis without really learning statistics needs to read this."

# Young brothers explain Bayes's theorem

These two young fellows are brothers from Palo Alto who've set out to produce a series of videos explaining the technical ideas in my novel Little Brother, and their first installment, explaining Bayes's Theorem, is a very promising start. I'm honored -- and delighted!

# Statistics Done Wrong: a guide to spotting and avoiding stats errors

Alex Reinhart's Statistics Done Wrong: The woefully complete guide is an important reference guide, right up there with classics like How to Lie With Statistics. The author has kindly published the whole text free online under a CC-BY license, with an index. It's intended for people with no stats background and is extremely readable and well-presented. The author says he's working on a new edition with new material on statistical modelling.

# A visit to the Indian temple where "0" was invented

The BBC's Alex Bellos travels to Gwalior, an Indian city that contains a temple with the oldest known use of the number "0". It's part of an effort to figure out why zero would appear in India, and not in other, earlier civilizations that were mathematically adept. From Bellos' perspective, part of the answer might lie in theology — a mathematical representation of the mystical idea of "nothingness".

# American education's use of "value added measures" is statistically bankrupt

American teachers are widely assessed on the basis of "value added measures," a statistical tool for analyzing the outcomes of their teaching. But as Jerry Genovese points out, this is statistically completely bankrupt -- unless you randomize your samples, you get no insight into the quality of the teaching. I asked my father, Gord Doctorow -- a mathematician, math teacher, and professor of education -- what he thought of Genovese's piece, and he sent me some great material, which you'll find after the jump.

# Statistics explained with the help of modern dance

If you're the type of person who really needs some good visuals to make a concept stick in your head, this series of YouTube videos made by the British Psychological Society Media Centre will help you remember the meanings behind statistical concepts like "correlation", "frequency distributions", and "sampling error". There are four videos in the series so far, and they do a great job of painting pictures around abstract ideas. Bonus: Soothing music.

Via Openculture

# 90 percent of Tor keys can be broken by NSA: what does it mean?

Errata Security CEO Rob Graham has published a blog-post speculating that ninety percent of the traffic on the Tor anonymized network can be broken by the NSA. That's because the majority of Tor users are still on the an old version of the software, 2.3, which uses 1024 RSA/DH keys -- and at keylengths of 1024 RSA/DH crypto can be broken in a matter of hours using custom chips fabbed at an estimated cost of \$1B. It seems likely that the NSA has spent the necessary sum and sourced these chips (likely from IBM).

This isn't the same as being able to decrypt all of Tor in realtime, but it does suggest that the NSA could selectively decrypt its stored archives of Tor traffic.

However, the new version of Tor, 2.4, uses elliptical curve Diffie-Hellman ciphers, which are probably beyond the NSA's reach.

Graham faults the Tor Project for the poor uptake of its new version, though as an Ars Technica commenter points out, popular GNU/Linux distributions like Debian and its derivative Ubuntu are also to blame, since they only distribute the older, weaker version. In either event, this is a wake-up call that will likely spur both the Tor Project and the major distros to push the update.

Yesterday's revelations about the NSA's ability to decrypt 'secure' communications were taken by many to mean that the NSA had made fundamental mathematical or computing breakthroughs that allowed it to decrypt securely enciphered messages. But it's pretty clear that's not what's going on.

# NSA probably hasn't broken strong crypto

You may have heard speculation that the NSA has secretly broken the strong cryptographic systems used to keep data secret -- after all, why collect all that scrambled data if they can't unscramble it? But Bruce Schneier argues (convincingly) that this is so impossible as to be fanciful. So why have they done this? My guess is that they're counting on flaws being revealed in the cryptographic implementations in the field (or maybe they've discovered such flaws and are keeping them secret). Or they're hoping for a big breakthrough in the future (quantum computing, anyone?).

# Great moments in pedantry: Double Stuf Oreos not actually double stuffed

In fact, the Double Stuf Oreos tested by a high school math class in Queensbury, N.Y. contained only 1.86x the amount of stuff that was in a regular Oreo. A Nabisco spokeswoman, responding to the scandal, says the measurements must have been inaccurate.

# Some real math on the real risk of shark attacks

Great white shark. © Oceana/David Stephens.

Shark attack stats: "The real threat is humans. For every one human killed by a shark, there are approximately 25 million sharks killed by humans."

About 200 million people go to U.S. beaches each year. About 36 of those hundreds of millions are attacked by sharks. Most of them survive. In contrast, more than 30,000 of those millions of beach-goers are to be rescued from surfing accidents. And many of those humans each year die, or must be rescued, from drowning incidents in which no other creature is to blame.

So, will we see Human Week, or Human-nado mockumentaries any time soon?

[Oceana.org]

# At VW's request, English court censors Usenix Security presentation on keyless entry systems for luxury cars

Flavio Garcia, a security researcher from the University of Birmingham has been ordered not to deliver an important paper at the Usenix Security conference by an English court. Garcia, along with colleagues from a Dutch university, had authored a paper showing the security failings of the keyless entry systems used by a variety of luxury cars. Volkswagon asked an English court for an injunction censoring his work -- which demonstrated their incompetence and the risk they'd exposed their customers to -- and Mr Justice Birss agreed.

# Creativity, math, and 12-tone music

We've featured doodling, fast-talking YouTube mathematician Vi Hart a lot here, but her latest, a 30-minute extended mix, is absolutely remarkable, even by her high standards. For 30 glorious minutes, Ms Hart explores the nature of randomness and pattern, using Stravinsky's 12-tone music as a starting-point and rocketing through constellations, the nature of reality, Borges's library, and more. On the way, she ends up with a good working definition of creativity, and explores the dilemma of structure versus creation. Brava, Ms Hart, you have outdone yourself! Plus, I like your copyright jokes.

# Tic-Tac-Toe squared

Want to play a game of Tic-Tac-Toe that's genuinely challenging and hard? Try "Ultimate Tic-Tac-Toe," in which each square is made up of another, smaller Tic-Tac-Toe board, and to win the square you have to win its mini-game. Ben Orlin says he discovered the game on a mathematicians' picnic, and he explains a wrinkle on the rules:

You don’t get to pick which of the nine boards to play on. That’s determined by your opponent’s previous move. Whichever square he picks, that’s the board you must play in next. (And whichever square you pick will determine which board he plays on next.)...

This lends the game a strategic element. You can’t just focus on the little board. You’ve got to consider where your move will send your opponent, and where his next move will send you, and so on.

The resulting scenarios look bizarre. Players seem to move randomly, missing easy two- and three-in-a-rows. But there’s a method to the madness – they’re thinking ahead to future moves, wary of setting up their opponent on prime real estate. It is, in short, vastly more interesting than regular tic-tac-toe.

# Math textbook attempts to solve relationship drama

The correct answer is that Brian and Angela just need to break up, already.

From Thanks, Textbooks — a fantastic Tumblr of supremely weird and hilarious textbook examples and questions.

# Symmetry and sound

This fantastic video by Vi Hart shows you what the math of music looks like in a visual representation — or, should that be "what visual frieze patterns sound like when turned into music"?

Frieze patterns are symmetrical repeating patterns that show up in architecture, art, and even our model of DNA. According to Hart, this video is:

A visual and musical expression of mathematical symmetry groups. The transformations done to the video are equivalent to the transformations done to the notes.

Very cool to watch! Here's the video link.

Thanks, Peter Newbury!

# Why math-fans really love set theory

Turns out, math fans dig set theory for almost exactly the same reason that some Christian fundamentalists absolutely hate it — all that messy uncertainty, which is either an affront to the idea of intelligent design or really, really sexy and fascinating, depending on your outlook.

At Nautilus, which is currently hosting an entire issue on topic of uncertainty, math professor Ayalur Krishnan writes about an idea in set theory that he calls "The Deepest Uncertainty". This is the Continuum Hypothesis — an idea that, paradoxically, can be proven to be unprovable and proven to be something you can't disprove. (And, with that, I've just typed the word "proven" so many times that it has lost all meaning in my brain.)

The uncertainty surrounding the Continuum Hypothesis is unique and important because it is nested deep within the structure of mathematics itself. This raises profound issues concerning the philosophy of science and the axiomatic method. Mathematics has been shown to be “unreasonably effective” in describing the universe. So it is natural to wonder whether the uncertainties inherent to mathematics translate into inherent uncertainties about the way the universe functions. Is there a fundamental capriciousness to the basic laws of the universe? Is it possible that there are different universes where mathematical facts are rendered differently? Until the Continuum Hypothesis is resolved, one might be tempted to conclude that there are.

Read the full story, which explains what set theory and the Continuum Hypothesis actually are. I could that here, but then this link would end up being as long as the story it's trying to link you to. Ahhhh, set theory.

# Fabergé Fractals

Here's a mesmerizing gallery of "Fabrege Fractals" created by Tom Beddard, whose site also features a 2011 video of Fabrege-inspired fractal landscapes that must be seen to be believed. They're all made with Fractal Lab, a WebGL-based renderer Beddard created.

# Unknown mathematician makes historical breakthrough in prime theory

Yitang Zhang is a largely unknown mathematician who has struggled to find an academic job after he got his PhD, working at a Subway sandwich shop before getting a gig as a lecturer at the University of New Hampshire. He's just had a paper accepted for publication in Annals of Mathematics, which appears to make a breakthrough towards proving one of mathematics' oldest, most difficult, and most significant conjectures, concerning "twin" prime numbers. According to the Simons Science News article, Zhang is shy, but is a very good, clear writer and lecturer.

For hundreds of years, mathematicians have speculated that there are infinitely many twin prime pairs. In 1849, French mathematician Alphonse de Polignac extended this conjecture to the idea that there should be infinitely many prime pairs for any possible finite gap, not just 2.

Since that time, the intrinsic appeal of these conjectures has given them the status of a mathematical holy grail, even though they have no known applications. But despite many efforts at proving them, mathematicians weren’t able to rule out the possibility that the gaps between primes grow and grow, eventually exceeding any particular bound.

Now Zhang has broken through this barrier. His paper shows that there is some number N smaller than 70 million such that there are infinitely many pairs of primes that differ by N. No matter how far you go into the deserts of the truly gargantuan prime numbers — no matter how sparse the primes become — you will keep finding prime pairs that differ by less than 70 million.

The result is “astounding,” said Daniel Goldston, a number theorist at San Jose State University. “It’s one of those problems you weren’t sure people would ever be able to solve.”

Unknown Mathematician Proves Elusive Property of Prime Numbers [Erica Klarreich/Wired/Simons Science News]

(Photo: University of New Hampshire)

# Life of astronaut Sally Ride honored in Kennedy Center tribute

American astronaut Sally Ride monitors control panels from the pilot's chair on the flight deck in 1983. Photo by Apic/Getty Images, via PBS NewsHour.

Tonight, PBS NewsHour science correspondent Miles O'Brien will serve as master of ceremonies in a Kennedy Center gala honoring the life and legacy of astronaut Sally Ride. The tribute will highlight her impact on the space program and her lifelong commitment to promoting youth science literacy.

Her Sally Ride Science organization reached out to girls, encouraging them to pursue careers in the Science, Technology, Engineering and Math (STEM) fields, where a gender gap persists.

At the PBS NewsHour website, read the column Miles wrote immediately following Ride's death in July 2012, 17 months after she was diagnosed with pancreatic cancer.

# Death, be not infrequent

The oldest person in the world died this year. But don't worry if you missed the event. The oldest person in the world will likely die next year, as well. In fact, according to mathematician Marc van Leeuwen, an "oldest person in the world" will die roughly every .65 years.

# Looking for mathematical perfection in all the wrong places

The Golden Ratio — that geometric expression of the Fibonacci sequence of numbers (1, 1, 2, 3, 5, etc.) — has influenced the way master painters created art and can be spotted occurring naturally in the seed arrangement on the face of a sunflower. But its serendipitous appearances aren't nearly as frequent as pop culture would have you believe, writes Samuel Arbesman at The Nautilus. In fact, one of the most common examples of mathematical perfection — the chambered nautilus shell — actually isn't. Even math can become part of the myths we tell ourselves as we try to create meaning in the universe.

Image: Golden Ratio, a Creative Commons Attribution (2.0) image from ernestduffoo's photostream

# The mathematics of tabloid news

Leila Schneps and Coralie Colmez have an interesting piece at The New York Times about DNA evidence in murder trials, the mathematics of probability, and the highly publicized case of Amanda Knox. What good is remembering the math you learned in junior high? If you're a judge, it could be the difference between a guilty verdict and an acquittal.

# Weird probabilities of non-transitive "Grime Dice"

Michael de Podesta has been doing the math on "Grime Dice" -- six sided cubes whose sides average out to 3.5, but whose face values are all radically different:

The interesting thing about these is that the odds of one die beating another are simple to calculate, but shift radically once you start rolling dice in pairs. It's a beautiful piece of counterintuitive probability math:

The amazing property of these dice is discernible when you use them competitively – i.e. you roll one dice against another. If you roll each of them against a normal dice then as you might expect, each dice will win as often as it will lose. But if you roll them against each other something amazing happens.

• Dice A will systematically beat Dice B
• Dice B will systematically beat Dice C

and amazingly

• Dice C will systematically beat Dice A

So the fact that Dice A beats Dice B, and Dice B beats Dice C does not ensure that Dice A will beat Dice C. Wow!

And how about this: If you ‘double up’ and roll 2 Dice  A‘s against 2 Dice B‘s – the odds change around and now the B‘s will beat the A‘s ! Is that really possible? Well yes, and just to convince myself I wrote a Spreadsheet (.xlsx file) and generated the tables at the bottom of the article. If you download it you can change the numbers to try out other combinations.

# Celebrate "Pi Day" by throwing hot dogs down a hallway

No, that's not a euphemism for anything. Buffon's Needle is an 18th-century experiment in probability mathematics and geometry that can be used as a way to calculate pi through random sampling. This WikiHow posting explains how you can recreate Buffon's Needle at home, by playing with your food.

# Calculus-performing mechanical calculator

A clip from the Discovery Channel's Dirty Jobs program on tanneries demonstrates the workings of a calculus-performing mechanical calculator that measures the surface-area of irregularly shaped hides with a fascinating and clever set of gears, calipers and ratchets.

Dirty Jobs - Tannery Mechanical Surface Integrator (Thanks, Dad!)

# Game theory and bad behavior on Wall Street

An opinion piece by Chris Arnade on the asymmetry in pay (money for profits, flat for losses), which he describes "the engine behind many of Wall Street’s mistakes" That asymmetry "rewards short-term gains without regard to long-term consequences," Chris writes in a new guest blog at Scientific American. "The results? The over-reliance on excessive leverage, banks that are loaded with opaque financial products, and trading models that are flawed." [Scientific American Blog Network]

# The world's largest prime number — visualized

Philip Bump took the recently discovered 17-million-digit prime number and, six digits at a time, converted it into RGB colors. This is the result.

# Neil deGrasse Tyson on pi and other constants

Both the Bible and the Indiana State Legislature have tried to redefine pi to equal something much more simple than 3.14159265358979323846264338327950 ...

# 86.54% liked this

Science blogger Matt Springer analyzes the surprisingly fascinating math behind Reddit upvotes.

# Probability theory for programmers

Jeremy Kun, a mathematics PhD student at the University of Illinois in Chicago, has posted a wonderful primer on probability theory for programmers on his blog. It's a subject vital to machine learning and data-mining, and it's at the heart of much of the stuff going on with Big Data. His primer is lucid and easy to follow, even for math ignoramuses like me.

For instance, suppose our probability space is $\Omega = \left \{ 1, 2, 3, 4, 5, 6 \right \}$ and $f$ is defined by setting $f(x) = 1/6$ for all $x \in \Omega$ (here the “experiment” is rolling a single die). Then we are likely interested in more exquisite kinds of outcomes; instead of asking the probability that the outcome is 4, we might ask what is the probability that the outcome is even? This event would be the subset $\left \{ 2, 4, 6 \right \}$, and if any of these are the outcome of the experiment, the event is said to occur. In this case we would expect the probability of the die roll being even to be 1/2 (but we have not yet formalized why this is the case).

As a quick exercise, the reader should formulate a two-dice experiment in terms of sets. What would the probability space consist of as a set? What would the probability mass function look like? What are some interesting events one might consider (if playing a game of craps)?

(Image: Dice, a Creative Commons Attribution (2.0) image from artbystevejohnson's photostream)