42,000 Mozilla supporters contributed to Common Voice, a free-open dataset of 1,361 hours of voice recordings in 18 languages, which is now free for anyone to use as a set of "high quality, transcribed voice data... available to startups, researchers, and anyone interested in voice-enabled technologies" -- in a field plagued with sampling bias problems, this is a dataset that aims to be diverse, representative and inclusive, and it's growing by the day (you can contribute your voice too!) -- the whole project is inspiring. (via Four Short Links)
Read the rest “Common Voice: Mozilla releases the largest dataset of voice samples for free, for all”
In 2009, Obama signed an executive order requiring the administrative branch to embrace the broadest, most liberal approach to the Freedom of Information Act, reversing John Ashcroft's 2001 memo that instructed government agencies to turn over as little information to the public as possible.
Read the rest “In a huge win for open data, Congress passes the Open, Public, Electronic, and Necessary Government Data Act”
The Electronic Frontier Foundation and Muckrock teamed up to use the Freedom of Information Act to extract the details of 200 US cities' Automated License Plate Recognition camera programs (ALPR), and today they've released a dataset containing all the heretofore secret data on how these programs are administered and what is done with the data they collect.
Read the rest “Here's the secret details of 200 cities' license-plate tracking programs”
Matt Chapman used the Freedom of Information Act to get the City of Chicago's very mess parking ticket data; after enormous and heroic data normalization, Chapman was able to pinpoint one of the city's most confusing parking spots, between 1100-1166 N State St, which cycled between duty as a taxi-stand and a parking spot with a confusingly placed and semi-busted parking meter.
Read the rest “How a civic hacker used open data to halve tickets at Chicago's most confusing parking spot”
When researchers write, we don't just describe new findings -- we place them in context by citing the work of others. Citations trace the lineage of ideas, connecting disparate lines of scholarship into a cohesive body of knowledge, and forming the basis of how we know what we know.
The Australian government's open data initiative is in the laudable business of publishing publicly accessible data about the government's actions and spending, in order to help scholars, businesses and officials understand and improve its processes.
Read the rest “The Australian health authority believed it had "anonymised" a data-set of patient histories, but academics were easily able to unscramble it”
Microsoft co-founder Paul Allen funded the Allen Brain Observatory, a detailed, rich data-set derived from parts of a mouse-brain: what's striking is that the Allen Institute released all the data into the public domain, at once, as soon as it was available, which is exactly what you'd want the publicly funded alternatives to do, and what they almost never do. Read the rest “Why did it take a private foundation to do public science right?”
In a lead editorial in the current Nature, John Wilbanks (formerly head of Science Commons, now "Chief Commons Officer" for Sage Bionetworks) and Eric Topol (professor of genomics at the Scripps Institute) decry the mass privatization of health data by tech startups, who're using a combination of side-deals with health authorities/insurers and technological lockups to amass huge databases of vital health information that is not copyrighted or copyrightable, but is nevertheless walled off from open research, investigation and replication. Read the rest “Our public health data is being ingested into Silicon Valley's gaping, proprietary maw”
The Transatlantic Trade and Investment Partnership is an EU-US "trade agreement" that will allow corporations to sue governments in secret tribunals to force them to repeal their safety, environmental and labor laws. Read the rest “Revealed: the hidden web of big-business money backing Europe and America's pro-TTIP "think tanks"”
Christian writes, "Come and participate in 24 hours to hack for Hackney: we will be doing the first Hack-ney-thon to try and improve how the council works and life in the borough. To help with this the council has started a Github account. Read the rest “Hackney hackathon in London this weekend”
Rogue archivist Carl Malamud sez, "I just finished ripping 30 DVDs from the IRS. This is the monthly feed of nonprofit tax returns. I now have 7,442,564 of these returns spinning on the net. I've had it.
This year, the IRS upped the cost of this feed to $2910. I've already spent $16,137 on this brain dead format. For 2 years, I've been writing to the IRS to suggest better ways. Dropbox anybody? An FTP server?" Read the rest “Tell the IRS that mountains of DVDs are a stupid way to distribute public records”
Rogue archivist Carl Malamud sez,
Read the rest “Tax returns for 6,461,326 tax-exempt organizations now indexable by search engines and available for free downloads, thanks to Resource.org”
If you want access to all the tax filings of US nonprofit corporations, the IRS will sell you sets of DVDs for $2580 per year of data. We acquired all of these filings from 2002 to the present, a set of DVDs weighing 98.7 pounds. I'm pleased to report that all 6,461,326 of those returns are now successfully extracted and available on our new bulk data feed.
This data really should be available directly from the IRS at no charge. Accordingly, we've drafted a deed of gift offering the system back to the government.
Until the .gov people do take it over, we're offering access to all 5 TBytes of data using the http, ftp, and rsync protocols. Our hope is that developers will come up with lots of new uses for this information. In order to make the database even more useful, we've started working with Captricity to extract data from the forms and make it available as computable data (e.g., CVS files instead of TIFF images!).
Once search engines such as Google finish indexing the data, the tax filings of nonprofits will show up in the search results. When you search for a nonprofit, the first thing you see ought to be their home page. But, the next thing you ought to see are things like how much they pay their CEO, how much revenue goes for fundraising, and if they spend money to lobby public officials.
Nonprofits in the US had $1.87 trillion in 2009 revenues and it is these periodic filings that make the nonprofit marketplace work properly, just like SEC EDGAR filings help make the corporate markets work properly.
Adam sez, "The first Open-data Cities Conference takes place in Brighton, England next week. It's aimed at local councils and government agencies who want to open up more of their datasets, and giving them ideas and practical help on how to do it. There's some good speakers, including Tom Steinberg from MySociety and Rufus Pollock from the Open Knowledge Foundation."
The high-profile conference – the first of its kind in the United Kingdom – will focus on how publicly-funded organisations can engage with citizens to build more creative, prosperous and accountable communities.
It will be attended by more than 200 people who believe the value of public data is greatest when it is freely and openly shared. They will be leaders from the public sector, arts and cultural organisations, and creative and digital industries.
The focus will be on the opportunities to improve the lives of more than 10 million citizens in the UK’s biggest cities.
Open-data Cities Conference (Thanks, Adam!) Read the rest “Open-data Cities Conference in Brighton, England: turning municipal governments into open data collaborators”
Here's a terrific article by Gilles Frydman at e-patients.net advocating for opposition to H.R. 3699, aka The Research Works Act (RWA). The bill before Congress would seriously impede "the ability of patients and caregivers, researchers, physicians and healthcare professionals to access and use critical health-related information in a timely manner." (@timoreilly via @epatientdave) Read the rest “Open medical knowledge saves lives: Oppose H.R. 3699”