Public Resource wants to liberate tax records for US nonprofits – converting 100lbs of scanned bitmaps on DVDs into searchable data on $1.5T worth of activity

Rogue archivist Carl Malamud sez,

On November 1, Public.Resource.Org released a new service which put 6,461,326 US nonprofit tax returns on the net for bulk download, developers, and search engines to access. We offered to give the working system to the government, and also sent them a few suggestions on ways they could better meet their mission and save themselves a boatload of money. Since then, we've been frantically trying to get the government's attention to take decisive action, but to no avail.

The way the government makes the nonprofit tax returns available to the public is broken in many ways. The IRS insists on selling the tax returns as a monthly feed of DVDs costing $2,580 per year. Each month, I get a stack of a dozen DVDs, each one has 60,000 1-page TIFF files on it. This is just so lacking in clue, and even simple suggestions like using Dropbox instead of mailing us DVDs have been ignored.

In terms of breakage though, the truly big problem is the deliberate dumbing down of tax returns for large nonprofits in order to avoid what an IRS official actually said to us would be "too much transparency." All the big nonprofits have to e-file their tax returns. E-filing means they submit actual machine-processable data encoded in XML.

The way the IRS releases that information is mind-boggling. They image the data onto tax forms and then release them as 200 dot per inch TIFF files. So, instead of having a computer program extract the gross revenue, or the CEO salaries, or whether or not the nonprofit operates a tanning salon on premises (an actual question on the form!), you get something that is so bad that OCR is difficult. Nonprofits are a $1.5 trillion chunk of the U.S. economy, yet we're deliberately dumbing down data that could make that sector more efficient and more vibrant. That's dumb.

Since November, we've been trying to get the IRS and the Obama Administration to release this information, but they've refused. We've met with all sorts of IRS officials such as Lois Lerner and Joseph Grant of Tea Party fame, and we've also met with a ton of boldface names in the White House, such as Todd Park (the President's CTO) and Steve VanRoekel (the Federal CIO). Nobody will release the data. The IRS is worried the big nonprofits will be upset if information such as multimillion-dollar CEO salaries is more readily available.

Since discussion hasn't worked so far, we've retained the services of Thomas R. Burke, an eminent First Amendment attorney at Davis Wright Tremaine and he's been working with our own counselor David Halperin. Today, they filed suit in the U.S. District Court for the Northern District of California. One reason we picked the Northern District because they have a requirement that the parties try and work out their problems out of court using what is known as Alternative Dispute Resolution (ADR), which includes techniques such as mediation and arbitration. The ADR rules in this District Court require each party to bring to the mediation an official who has the authority to resolve this issue.

So, I'm reaching out to my good friends Todd Park and Steve VanRoekel, the architects of the President's great new machine-processable data directive, and I'm personally asking them to help us resolve this dispute with the administration. We're all on the same side here, let's work this out and get on with the real job at hand!

Our complaint in district court
Copies of our letters back and forth to the White House and the IRS
Sunlight Foundation: Nonprofit E-file Data Should Be Open
Think Progress: How the IRS Could Make it Easier to Track Dark Money, Right Now
Forbes: IRS: Turn Over a New Leaf, Open Up Data