Features Podcasts Family Video Comics Music Tech Science Books Film & TV Games

In which Ye Olde Metadata Network tracks that traitor Paul Revere

Sociologist Kieran Healy does a nice job of explaining how even a data system that doesn't contain the actual content of conversations can be part of a very powerful surveillance state. Part parody and part demonstration, he uses information about organization membership roles in 18th-century Boston to pinpoint Paul Revere as a key player in a network of "traitors". Maggie

In which Santa helps remind us all of the importance of metadata

Metadata is one of those things that is so important, it becomes easy to forget about. We often collect metadata without thinking about it. When we don't collect it — or if we collect it in a sloppy manner — we notice very quickly that something has gone wrong. But when someone says the word "metadata", a large number of us go, "the what now?" And start trying to remember what that word means before we make ourselves sound dumb in conversation.

Metadata is really just information about information — it helps us organize, find, and standardize the things we know and want to know. At the Information Culture blog Bonnie Swoger offers some Christmas-themed examples that will help you remember what metadata is, help you understand why it's such a big deal, and improve your ability to do metadata right.

If you stumbled across this list on the web you might be able to guess what it was, but you couldn’t be sure. It would also be difficult to find this list again if you were looking for it. The list creator might find this pretty useful, but if he or she shared it with others, we would want some added information to help the new user understand what he or she was looking at: this is metadata.

Metadata for this data file:

Who created the data: Santa Claus, North Pole. An email address would be nice. This way we have some contact information in case we need clarification.
Title: “My List” isn’t a title that is conducive to finding the file again. While it might be tempting to just call this “Santa’s list” that won’t help other folks who see this file. The title should be descriptive of what the data file contains, and “Santa’s List” could be many things: Santa’s list of Reindeer? Santa’s list of toys that need to be made? A more descriptive title might be “Santa’s list of naughty and nice children.”
Date created: We don’t want to confuse this year’s list (2012) with last year’s list (2011). This could lead to all sorts of unfortunate events where nice kids get coal, naughty kids get presents, or infants (who weren’t around in 2011) get nothing at all.
Who created the data file: Perhaps Santa created the data, but then used an elf to input the data into a computer file. Many computer programs automatically record this information, although you may not realize this. How the list was created: Behavioral scans? Parental surveys? Elf on the Shelf reports? All of the above? In order to reuse this data in future research projects, we need to know how it was collected, including collection instruments and methodologies.
Definitions of terms used: What is “naughty” what is “nice”? How did Santa place a child into one category or another?
File type: What kind of file is it? The data here are pretty simple, but Santa has lots of different file formats to choose from: excel, .csv, xml, etc. Knowing the file type helps end users determine if they can use the data.

Read the full story and get more great examples

Massive public domain catalog dump from Harvard

David Weinberger writes, "Harvard University has today put into the public domain (CC0) full bibliographic information about virtually all the 12M works in its 73 libraries. This is (I believe) the largest and most comprehensive such contribution. The metadata, in the standard MARC21 format, is available for bulk download from Harvard. The University also provided the data to the Digital Public Library of America’s prototype platform for programmatic access via an API. The aim is to make rich data about this cultural heritage openly available to the Web ecosystem so that developers can innovate, and so that other sites can draw upon it. This is part of Harvard’s new Open Metadata policy which is VERY COOL." Cory

ICanStalkU twitterbot nags Twitter users about disclosing their location


ICanStalkU is a twitterbot Twitter-analyzing service that seeks out Twitter users who transmit their location in the photos they tweet and generates responses like "ICanStalkU was able to stalk @XXXXXXXXXX at http://maps.google.com/?q=35.5371666667,139.510166667," with the stated purpose of "Raising awareness about inadvertent information sharing."

I generally like the idea of helping people understand that their software may be disclosing information about themselves that they're not aware of, but I find this method a little tiresome. On a few occasions, I've deliberately turned on location data when sending out an image (for example, when tweeting an image of a public event or artwork and wanting to conveniently attach a location to the tweet so others can find it) only to get chided (not by bots, but by other Twitter users) who sent words to the effect of, "Some privacy advocate you are! Why are you sending location data with your images?"

I've also been nagged by someone's twitterbot that wanted to tell me off for including my email address in a tweet, because the author had decided that this would make me more vulnerable to spam (I have one email address and it's been public for about 15 years now -- there's no spambot that doesn't know it by now). It's nice that people want to help others understand the wider context of their actions, but there's a fine line between helping and nagging.

Adding location data to a photo of something in public -- a protest, a spectacle, a store -- isn't necessarily a privacy breach. Nor does it necessarily give information about the photographer's location (photographers might choose to post the images later, long after they've left that location). And location metadata on photos can be very useful. It would be great to see more nuance from ICanStalkU.

I Can Stalk U - Raising awareness about inadvertent information sharing (Thanks, Avi!)