Tesla's "car-as-service" versus your right to see your data

Espen got a parking ticket for his Tesla, and he's pretty sure he can exonerate himself, if only the company would give him access to his car's data, but they won't.

Beautiful animated air traffic patterns

Air traffic data is great fodder for visualizations. Case in point, this lovely animation of a day of flights titled "North Atlantic Skies" by air traffic control firm NATS. (via Laughing Squid)

Fax Your GP: quick opt-out from insane NHS plan to sell your medical records

The UK National Health Service has initiated a plan to take the nation's private health records and sell them off to private companies in a process overseen by notorious multinational bumblewads ATOS. If you live in the UK England, your records -- mental health records, prescriptions, records of surgeries including abortions, and other sensitive personal information -- will be handed over to a wide-ranging group of companies all over the world.

Unless you opt out. And opting out isn't easy. There's no central place to opt out. Instead, you have to send a letter to your GP's surgery, which means you have to look up your GP's surgery's address, compose a legally sufficient letter, print it out, find an envelope and a stamp -- etc.

However! There's a better way. A group of volunteers whom I trust implicitly, including the astounding Stef Magdalinski (who made the Faxyourmp service that is the ancestor of Theyworkforyou) have created Fax Your GP, a dead-simple form that will look up your GP's fax number for you, create a form opt-out letter you can fill in in just a few easy steps, and then they'll fax that letter directly to your GP's surgery. I just opted out.

Interactive graphic of migration within US

Chris Walker created a fascinating interactive graphic of migration patterns within the United States. It's based on US Census Bureau's 2012 American Community Survey estimates. Here are a few insights that Walker gleaned:

Spoiler: your nearest pizza joint is probably Pizza Hut

Created by Flowing Data, this map reveals exactly what pizza chain dominates in any given 10-mile region of the U.S.

Animation about cell phone data mining

Michael Rigley created this beautiful animation, titled "Network," for his BFA design thesis project at the California College of Art. It's about personal data captured by cell phone providers and is quite relevant this week.

Toronto mayoral disaster: illegal deletion of staffers' email?

More news from the embattled mayor of Toronto, Rob "Laughable Bumblefuck" Ford: after two of his senior staffers walked out on him following questioning by Toronto homicide detectives, it appears that someone illegally ordered the destruction of their archived city emails and call-records -- as well as the archived electronic communications of Ford's former chief of staff, whom Ford fired under mysterious circumstances.

The Star heard concerns at city hall Wednesday afternoon over the potential destruction or hiding of the records of three staffers who resigned or were fired during the ongoing crack cocaine scandal. Sources told the Star the records were in danger after city employees were directed to delete them.

The Star sent a request late Wednesday to the city asking for email and phone records of the three staffers in question for the time period during which the video at the heart of the scandal has been discussed.

Emails sent by city employees, including political staffers, are automatically preserved by the city, though emails related to “personal” business are exempt from freedom of information requests.

Two people familiar with the system said the emails of specific political staffers cannot be permanently erased from the system.p

Rob Ford video scandal: Concerns raised over safety of email records

Animated graphic of meteorites seen impacting Earth


Carlo Zapponi created Bolides, a fantastic animated visualization of meteorites that have been seen hitting the Earth. The data source is the Nomenclature Committee of the Meteoritical Society's Meteorite Bulletin. "The word bolide comes from Greek βολίς bolis, which means missile. Astronomers tend to use bolide to identify an exceptionally bright fireball, particularly one that explodes." Bolides

Satellites trace the appearance of crop circles in Saudi Arabia

It's not the work of aliens. Instead, you can chalk these crop circles up to humans + money + time. And, with the help of satellite imaging, you can watch as humans use money to change the desert over the course of almost 30 years.

Landsat is a United States satellite program that's been in operation since 1972. Eight different satellites (three of them still up there and functioning) have gathered images from all over the world for decades. This data is used to help scientists studying agriculture, geology, and forestry. It's also been used for surveillance and disaster relief.

Now, at Google, you can look at images taken from eight different sites between 1984 and 2012 and and watch as people change the face of the planet. In one set of images, you can watch agriculture emerge from the deserts of Saudi Arabia — little green polka-dots of irrigation popping up against a vast swath of tan. In another se, you'll see the deforestation of the Amazon. A third, the growth of Las Vegas. It's a fascinating view of how we shape the world around us, in massive ways, over a relatively short period of time.

Bloomberg publishes CEO-to-employee-pay chart

Alan sez, "Bloomberg got tired of waiting for the SEC to implement its own rule requiring disclosure of data on how many times the median salary the CEO makes for publicly traded companies so they did a little sleuthing of public data and a little averaging math and calculated the ratio for the top 250 of the S&P 500 companies. The data are searchable and sortable and there's space for companies to comment, which quite a few have done. To my surprise Oracle is not #1, though it is the only tech firm in the top 10."

Top CEO Pay Ratios (Thanks, Alan!)

Siri keeps data for "up to two years", but only anonymously

Robert McMillan explains what happens to the data generated and stored with Siri queries: "Once the voice recording is six months old, Apple “disassociates” your user number from the clip, deleting the number from the voice file. But it keeps these disassociated files for up to 18 more months for testing and product improvement purposes." [Wired]

Internet penetration is never correlated with increasing power to dictators, and is often correlated with increased freedom

Philip N Howard wonders if there are any countries that have, on balanced, suffered as a result of the coming of the Internet -- say, because improved networks created so many opportunities for dictators to spy on dissidents that it swamped any free speech/free association benefits that the Internet delivered. So he scatter-plotted PolityIV’s democratization scores from 2002/2011, and cross-referenced them with World Bank/ITU data on internet users. The conclusion: by this method, no country experienced a decline in its overall levels of a democracy as it attained widespread Internet penetration, and almost all many countries experienced a rise in democracy levels that correlated to a rise in Internet penetration.

Are there any countries with high internet diffusion rates, where the regime got more authoritarian? The countries that would satisfy this condition should appear in the top left of the graph. Alas, the only candidates that might satisfy these two conditions are Iran, Fiji, and Venezuela. Over the last decade, the regimes governing these countries have become dramatically more authoritarian. Unfortunately for this claim, their technology diffusion rates are not particularly high.

This was a quick sketch, and much more could be done with this data. Some researchers don’t like the PolityIV scores, and there are plenty of reasons to dislike the internet user numbers. Missing data could be imputed, and there may be more meaningful ways to compare over time. Some countries may have moved in one direction and then changed course, all within the last decade. Some only moved one or two points, and really just became slightly more or less democratic. But I’ve done that work too, without finding the cases Morozov wishes he had.

There are concerning stories of censorship and surveillance coming from many countries. Have the stories added up to dramatic authoritarian tendencies, or do they cancel out the benefits of having more and more civic engagement over digital media? Fancier graphic design might help bring home the punchline. There are still no good examples of countries with rapidly growing internet populations and increasingly authoritarian governments.

Are There Countries Whose Situations Worsened with the Arrival of the Internet?

Sloppy statistics: Do 50% of Americans really think married women should be legally obligated to change their names?

Jill Filipovic wrote an opinion column for The Guardian yesterday, arguing against the practice of women taking their husbands' names when they get married. It ended up linked on Jezebel and found its way to my Facebook feed where one particular statistic caught my eye. Filipovic claimed that 50% of Americans think a women should be legally required to take her husband's name.

First, some quick clarification of my biases here. Although I write under a hyphenate, I never have legally changed my name. I've never had a desire to do so. In my private life, I'm just Maggie Koerth and always will be. That said, I personally take issue with the implication at the center of Filipovic's article — that women shouldn't change their names and that to do so makes you a bad feminist. For me, this is one of those personal decisions where I'm like, whatever. Make your own choice. Just because I don't get it doesn't mean you're wrong.

But just like I take objection to being all judgey about personal choices, I also take objection to legally mandating personal choices, and I was kind of blown away by the idea that 50% of my fellow Americans think my last name should be illegal.

So I looked into that statistic. And then I got really annoyed.

Book about big data, predictive behavior, and decision making

Kenneth Cukier was on NPR this morning talking about the new book he wrote with Viktor Mayer-Schonberger, "Big Data: A Revolution That Will Transform How We Live, Work and Think." It sounds fascinating and relevant to research I'm doing at Institute for the Future on newfound applications of systems thinking in what we're calling the "coming age of networked matter." Here are some choice bits from the interview:
NewImageOn how Target identifies pregnant customers

"The example comes from Charles Duhigg, who's a reporter at The New York Times, and he's the one who uncovered the story. What Target was doing was they were trying to find out what customers were likely to be pregnant or not. So what they were able to do was to look at all the different things that couples were buying prior to the pregnancy — such as vitamins at one point, unscented lotion at another point, lots of hand towels at another point — and with that, make a prediction, score the likelihood that this person was pregnant, so that they could then send coupons to the people involved... there might be a coupon for a stroller or for diapers ...

On how Google tracks the flu

"Google stores all of its searches. What they were able to do was go through the database of previous searches to identify what was the likely predictor that there was going to be a flu outbreak in certain regions of America. Now, keep in mind, we pay for the [Centers for Disease Control and Prevention] to look at the United States and find out where flu outbreaks are taking place for the seasonal flu. But the difference is that it takes the CDC about two weeks to report the data. Google does it in real time simply on search queries."

Big Data: A Revolution That Will Transform How We Live, Work and Think" (Amazon)

The 'Big Data' Revolution: How Number Crunchers Can Predict Our Lives (NPR)

DNA for data storage

Researchers have successfully stored information in synthetic DNA and then sequenced the DNA to read the data. Nick Goldman and his colleagues from the European Bioinformatics Institute (EBI) encoded all of Shakespeare's sonnets, an audio clip of Martin Luther King's "I have a dream" speech, Watson and Crick's paper on DNA's structure, a photo of the EBI, and an explanation of their data conversion technique. Last year, Harvard molecular geneticist George Church encoded a book he had written in DNA, but EBI's breakthroughs are in the way the data is encoded and its error-correction. From the abstract of their scientific paper published at Nature:
We encoded computer files totalling 739 kilobytes of hard-disk storage and with an estimated Shannon information10 of 5.2 × 106 bits into a DNA code, synthesized this DNA, sequenced it and reconstructed the original files with 100% accuracy. Theoretical analysis indicates that our DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving. In fact, current trends in technological advances are reducing DNA synthesis costs at a pace that should make our scheme cost-effective for sub-50-year archiving within a decade.
"Synthetic double-helix faithfully stores Shakespeare's sonnets" (Thanks, Mike Pescovitz!)

In which Santa helps remind us all of the importance of metadata

Metadata is one of those things that is so important, it becomes easy to forget about. We often collect metadata without thinking about it. When we don't collect it — or if we collect it in a sloppy manner — we notice very quickly that something has gone wrong. But when someone says the word "metadata", a large number of us go, "the what now?" And start trying to remember what that word means before we make ourselves sound dumb in conversation.

Metadata is really just information about information — it helps us organize, find, and standardize the things we know and want to know. At the Information Culture blog Bonnie Swoger offers some Christmas-themed examples that will help you remember what metadata is, help you understand why it's such a big deal, and improve your ability to do metadata right.

If you stumbled across this list on the web you might be able to guess what it was, but you couldn’t be sure. It would also be difficult to find this list again if you were looking for it. The list creator might find this pretty useful, but if he or she shared it with others, we would want some added information to help the new user understand what he or she was looking at: this is metadata.

Metadata for this data file:

Who created the data: Santa Claus, North Pole. An email address would be nice. This way we have some contact information in case we need clarification.
Title: “My List” isn’t a title that is conducive to finding the file again. While it might be tempting to just call this “Santa’s list” that won’t help other folks who see this file. The title should be descriptive of what the data file contains, and “Santa’s List” could be many things: Santa’s list of Reindeer? Santa’s list of toys that need to be made? A more descriptive title might be “Santa’s list of naughty and nice children.”
Date created: We don’t want to confuse this year’s list (2012) with last year’s list (2011). This could lead to all sorts of unfortunate events where nice kids get coal, naughty kids get presents, or infants (who weren’t around in 2011) get nothing at all.
Who created the data file: Perhaps Santa created the data, but then used an elf to input the data into a computer file. Many computer programs automatically record this information, although you may not realize this. How the list was created: Behavioral scans? Parental surveys? Elf on the Shelf reports? All of the above? In order to reuse this data in future research projects, we need to know how it was collected, including collection instruments and methodologies.
Definitions of terms used: What is “naughty” what is “nice”? How did Santa place a child into one category or another?
File type: What kind of file is it? The data here are pretty simple, but Santa has lots of different file formats to choose from: excel, .csv, xml, etc. Knowing the file type helps end users determine if they can use the data.

What the FDA doesn't want to tell you about livestock antibiotic use

Short version: There is LOTS the FDA doesn't want to tell you about livestock antibiotic use. And that matters. As I reminded you yesterday, the antibiotics we use to keep ourselves alive and healthy are rapidly losing their effectiveness against a whole host of diseases. Antibiotic resistance to disease is driven by overuse of antibiotics — both in humans and in animals. And there are lots of antibiotics being used on animals. The trouble is, public health researcher know very little about that use. Because the FDA refuses to release more than the bare minimum of data. For added fun, last year, they stopped even trying to regulate antibiotic use on livestock — opting instead for voluntary self-control systems.

Sitegeist: mobile app mines public data to tell you about the spot you're standing in

Nicko sez, "Sitegeist is a free Android and iPhone app from the Sunlight Foundation that helps you to learn more about your surroundings in seconds. Sitegeist takes public data about the people, housing, history, environment and things to do for any U.S. location and presents it in easy-to-view infographics. Just scroll and swipe your way through the categories to get a feel for the area. Everything from age distributions to political contributions and median home values to record temperatures. It makes complex localized data easy to understand so you can get back to enjoying the neighborhood. The app incorporates publicly available data from a number of sources including the U.S. Census Bureau,, the Dark Sky weather API and even Yelp and Foursquare. Sunlight will continue to add and improve on the app as more rich data becomes public."

(Thanks, Nicko!)

Open science event in London this weekend

If you're in London this weekend, you should know that the Wellcome Trust is sponsoring a two-day bioscience hackathon with prizes awarded for the best ideas in four categories: Open Me — collecting data on yourself and making it useful to yourself; Open Research — making biomedical data produced by professional scientists more accessible and useful to everybody; Open Data — creating apps and hardware that allow doctors to better follow what's really happening with their patients; and the idea that is most useful to the public at large.

Breast cancer patients: Stanford launches lymphedema registry study

Lymphedema occurs in about 7% of breast cancer patients who have undergone sentinel lymph node biopsy (to see if disease has spread to these lymph nodes), and in greater percentage of patients whose nodes end up being removed (because one or more contain cancer) and patients who receive radiation therapy after breast surgery. Lymphedema is basically a chronic swelling of the affected arm, caused by trapped lymph fluid. It can be disabling, disfiguring, and extremely painful.

"Once lymphedema develops, it is permanent," says my friend Dr. Deanna Attai, a breast surgeon in Burbank, CA. "Physical therapy can help minimize swelling and other complications, but there is currently no cure. Early recognition and prompt treatment definitely makes a difference."

Dying old satellites jeopardize future storm coverage

In the NYT, a story about "endangered satellites" that orbit the earth and provide essential data for tracking storms like Hurricane Sandy. But because of "years of mismanagement, lack of financing and delays in launching replacements," they could begin falling apart—with no functional plan in sight to maintain those resources.

Why do some people say the Earth isn't getting hotter?

If you haven't seen the Skeptical Science website yet, you're missing out.

Via Tom Standage

Open-source human genomes

Yesterday, during a World Science Festival panel on human origins and why our species outlasted other species of Homo, geneticist Ed Green mentioned that there were thousands of sequenced human genomes, from all over the world, that had been made publicly available. Our code is open source.

But where do you go to find it? Several folks on Twitter had great suggestions and I wanted to share them here.

The 1000 Genomes Project—organized by researchers at the Wellcome Trust, the National Institutes of Health, and Harvard—is working on sequencing the genomes of 2500 individuals. The data they've already collected is available online. Read a Nature article about The 1000 Genomes Project: Data management and community access.

The Personal Genome Project is interactive. Created by a researcher at Harvard Medical School, the program is aimed at enrolling 100,000 well-informed volunteers who will have their genomes sequenced and linked to anonymized medical data. Everything that's collected will be Creative Commons licensed for public use.

The University of California Santa Cruz Genome Browser is a great place to find publicly available genomes and sequences.

Thanks to Eva Rose, Aatish Bhatia, and Edward Banatt.

When the infographic craze finally goes too far

"Grand Old Party is data visualization project. It is also a set of butt plugs." (Thanks, Ben Goldacre. I think.)

Population growth isn't really our problem

In the course of preparing for a panel here at the Conference on World Affairs, I ran across a 2009 editorial by environmental journalist Fred Pearce, in which he explains why current global population trends aren't as horrific as they're often made out to be. I thought you should read it.

Global population is going up, Pearce writes, but that's not the same thing as saying that birth rates are going up. And, in the long run, that distinction matters. Around the world—not just in the West—human birthrates are decreasing. And they've been decreasing for a really long time.

Wherever most kids survive to adulthood, women stop having so many. That is the main reason why the number of children born to an average woman around the world has been in decline for half a century now. After peaking at between 5 and 6 per woman, it is now down to 2.6.

This is getting close to the “replacement fertility level” which, after allowing for a natural excess of boys born and women who don’t reach adulthood, is about 2.3. The UN expects global fertility to fall to 1.85 children per woman by mid-century. While a demographic “bulge” of women of child-bearing age keeps the world’s population rising for now, continuing declines in fertility will cause the world’s population to stabilize by mid-century and then probably to begin falling.

Far from ballooning, each generation will be smaller than the last. So the ecological footprint of future generations could diminish. That means we can have a shot at estimating the long-term impact of children from different countries down the generations.

What I really like about this essay, though, is how well Pearce articulates the real problem, which is over-consumption. Population and consumption might appear to be intrinsically linked, but they're not. As Pearce points out, global consumption is increasing far faster than global population and the average American family of four uses far more land, far more water, far more energy and produces far more emissions than an Ethiopian family of 11.

This is important. I've heard many, many Americans express their fears about population growth over the years. Pearce's essay makes it clear that, when you do that, you're pretty much being a concern troll. The population problem, while still real, is well on its way to solving itself. The consumption problem, not so much. Population growth is a problem of the poor. Consumption growth is a problem of the rich (which, from a global perspective, includes pretty much everyone in the United States). So when you ignore consumption and pin the blame for global sustainability issues on population, what you're doing is blaming the 99% for the mistakes of the 1%.

Read Frank Pearce's entire essay on Yale Environment 360

Image: Family Portrait, a Creative Commons Attribution Share-Alike (2.0) image from 12567713@N00's photostream

The story of the Apollo 11 moon landing, as told through data (video)

[video link]

This data visualization of the Apollo 11 moon mission gathers social and technical data from the 1969 lunar landing in video form. The horizontal axis is an interactive timeline.

The horizontal axis is an interactive timeline. The vertical axis is divided into several sections, each corresponding to a data source. At the top, commentators are present in narratives from Digital Apollo and NASA technical debriefings. Just below are the members of ground control. The middle section is a log-scale graph stretching from Earth (~10E9 ft. away) to the Moon. Utterances from the landing CAPCOM, Duke, the command module pilot, Collins, the mission commander, Armstrong, and the lunar module pilot, Aldrin, are plotted on this graph. The graph is partially overlaid on a composite image of the lunar surface.

More about the data presented, and the story told, at the project's Vimeo page. The project comes from the MIT Laboratory for Automation, Robotics, and Society, and was directed by David Mindell. Via Maria Popova. As noted on Flowing Data, my only disappointment is that they didn't get to the "One small step for [a] man" part!

Additional credits: Visualization Design by Yanni Loukissas, and Francisco Alonso served as Research Assistant.

Heat your home with data

Server farms generate so much heat that they have to run air conditioning year round. That requires energy, which costs money and tends to mean burning more fossil fuels. Meanwhile, in winter, a lot of houses are cold. The people who live there have to turn on the heat, which costs money and tends to mean burning more fossil fuels.

So here's an idea: Why not distribute the hardware from a server farm, putting heat-producing equipment in houses that actually need the heat?

If a home has a broadband Internet connection, it can serve as a micro data center. One, two or three cabinets filled with servers could be installed where the furnace sits and connected with the existing circulation fan and ductwork. Each cabinet could have slots for, say, 40 motherboards — each one counting as a server. In the coldest climate, about 110 motherboards could keep a home as toasty as a conventional furnace does.

The rest of the year, the servers would still run, but the heat generated would be vented to the outside, as harmless as a clothes dryer’s. The researchers suggest that only if the local temperature reached 95 degrees or above would the machines need to be shut down to avoid overheating. (Of course, adding a new outside vent on the side of the house could give some homeowners pause.)

According to the researchers’ calculations, a conventional data center must invest about $400 a year to run each server, or about $16,000 for a cabinet filled with 40 of them. (This includes the costs of building a bricks-and-mortar center and of cooling the machines.)

Having homes host the machines could reduce the need for a company to build new data centers. And the company’s cost to operate the same cabinet in a home would be less than $3,600 a year — and leave a smaller carbon footprint, too. The company’s data center could thus cover the homeowner’s electricity costs for the servers and still come out way ahead financially.

It could certainly produce some logistical problems with security, but it's an intriguing idea, and a great example of how we can get the energy services we want for much less energy use. The researchers who proposed it, from Microsoft and the University of Virginia, call it a "data furnace." It'll be interesting to see where the idea goes from here.

Read the white paper where the idea of data furnaces was introduced. White papers are not peer-reviewed, by the way.

Read the New York Times article quoted above.

Via Geekwire and Stephen Curry

Image:Image: Dawdle's new servers - front, a Creative Commons Attribution (2.0) image from dawdledotcom's photostream