Public Resource wants to liberate tax records for US nonprofits - converting 100lbs of scanned bitmaps on DVDs into searchable data on $1.5T worth of activity

Rogue archivist Carl Malamud sez,

On November 1, Public.Resource.Org released a new service which put 6,461,326 US nonprofit tax returns on the net for bulk download, developers, and search engines to access. We offered to give the working system to the government, and also sent them a few suggestions on ways they could better meet their mission and save themselves a boatload of money. Since then, we've been frantically trying to get the government's attention to take decisive action, but to no avail.

The way the government makes the nonprofit tax returns available to the public is broken in many ways. The IRS insists on selling the tax returns as a monthly feed of DVDs costing $2,580 per year. Each month, I get a stack of a dozen DVDs, each one has 60,000 1-page TIFF files on it. This is just so lacking in clue, and even simple suggestions like using Dropbox instead of mailing us DVDs have been ignored.

In terms of breakage though, the truly big problem is the deliberate dumbing down of tax returns for large nonprofits in order to avoid what an IRS official actually said to us would be "too much transparency." All the big nonprofits have to e-file their tax returns. E-filing means they submit actual machine-processable data encoded in XML.

The way the IRS releases that information is mind-boggling. They image the data onto tax forms and then release them as 200 dot per inch TIFF files. So, instead of having a computer program extract the gross revenue, or the CEO salaries, or whether or not the nonprofit operates a tanning salon on premises (an actual question on the form!), you get something that is so bad that OCR is difficult. Nonprofits are a $1.5 trillion chunk of the U.S. economy, yet we're deliberately dumbing down data that could make that sector more efficient and more vibrant. That's dumb.

Since November, we've been trying to get the IRS and the Obama Administration to release this information, but they've refused. We've met with all sorts of IRS officials such as Lois Lerner and Joseph Grant of Tea Party fame, and we've also met with a ton of boldface names in the White House, such as Todd Park (the President's CTO) and Steve VanRoekel (the Federal CIO). Nobody will release the data. The IRS is worried the big nonprofits will be upset if information such as multimillion-dollar CEO salaries is more readily available.

Since discussion hasn't worked so far, we've retained the services of Thomas R. Burke, an eminent First Amendment attorney at Davis Wright Tremaine and he's been working with our own counselor David Halperin. Today, they filed suit in the U.S. District Court for the Northern District of California. One reason we picked the Northern District because they have a requirement that the parties try and work out their problems out of court using what is known as Alternative Dispute Resolution (ADR), which includes techniques such as mediation and arbitration. The ADR rules in this District Court require each party to bring to the mediation an official who has the authority to resolve this issue.

So, I'm reaching out to my good friends Todd Park and Steve VanRoekel, the architects of the President's great new machine-processable data directive, and I'm personally asking them to help us resolve this dispute with the administration. We're all on the same side here, let's work this out and get on with the real job at hand!

Our complaint in district court
Copies of our letters back and forth to the White House and the IRS
Sunlight Foundation: Nonprofit E-file Data Should Be Open
Think Progress: How the IRS Could Make it Easier to Track Dark Money, Right Now
Forbes: IRS: Turn Over a New Leaf, Open Up Data


  1. Maybe this is one of those issues where a WeThePeople petition would actually shine some light on the subject and maybe convince the White House to open up a bit?  

      1. I liked your comment for the intent, but we clearly have very different notions of “old”.. So GET OFF MY LAWN!

  2. It’s only fair. I mean, the phone metadata and email metadata, and credit card transaction metadata they’re collecting on me is all in the form of 200dpi TIFF, right? Right? Right!!?

  3. I’ve got a bunch of blank rewritable CDs and DVDs just gathering dust…maybe we should scan handwritten letters to the IRS + exec, load them onto the discs and send them along. maybe seeing their a reflection of their ridiculousness in the shiny surface will help them see the light.

    1. I’ll type up an email, print it out and mail it to you so that you can scan it and put it on DVD.

  4. NoCal FTW!

    I propose we call this issue AOLGate.

    Seriously, though. I’m really hoping this works out and sets a precedent, but I expect more delay. It would not surprise me if the data reveals some embarrassingly massive but politically tricky tax shenanigans that have been generally known but low priority for the IRS for years.

    Should we start a pool on how long it will take before the administration claims it’s a matter of national security somehow?

    Maybe each of those DVDs is a special anti-terror frisbee?

  5. You’re looking at it from a practical user standpoint.  From the point of view of the government, the $2,850 per year you spend employs the folks that run the operation.  They would be putting themselves out of a job if they made the information easily accessible online.  That’s how most government agencies think anyway.

    1. Actually, the IRS has ergonomics people on staff, and they demand their managers find ways to cut waste and create efficiencies. Obviously, these CD’s are part of a program that nobody has looked at in a long while.

      But remember that 20% of the IRS is laid off right now, thanks to the sequester, and the head of TGTE is on “administrative leave” because the Tea Party said so. Also, they’ve been badly underfunded since Reagan decided they were the problem. Much harder to get things done under these conditions.

      The petition is actually a great idea. Management pays attention to those at every government agency. Also, the IRS has a Taxpayer Advocate office that might be able to get some change – they have a fair amount of clout.

      1. Perhaps I was too harsh towards IRS employees.  On the other hand, the article makes it sound like a truly useful idea is either being ignored or squashed by the administration.

        It would be nice if the government would just do the right thing by default instead of needing a petition.

  6. Wow, a Cluetrain reference. 

    Setting my boss’s homepage to Cluetrain may or may not have contributed to my early departure from that, circa 1999, job.  He did not know how to change it.

    Seriously, this is just the .gov at work.  Obfuscation, obstruction, complication.  Techniques that only equal job security for the rather mundane  personnel that low level .gov jobs attract.

    I would love to see the government employees (at any level) held to the same standards that private business requires.

    1. You mean like Comcast, Halliburton, Blackwater, AT&T, Facebook, Enron, Time Warner, BP, Bank of America…?  

      1. No no, the magical corporation where everybody is awesome and there is no waste, fraud, or abuse. The one where everybody is paid what they are worth, and nobody ever steals the pension fund. You know, the one where life is fair.

  7. If the government is so enthralled with paying people to do all this pointless work to make DVDs, then perhaps they could pay some other people to OCR the TIFF files and post the actual numbers as HTML to the public on their website. 

  8. I’m trying to wrap my head around situations where “too much transparency” is a bad thing.  What exactly is the concern?

    That burglars would use the data to fish for well-paid non-profit executives whose houses they could rob?  I’d think the XML Edgar data for public companies would be much more fruitful.

    That it would be too easy to make lists like this that shame bad charities?

    Seriously, I’m trying to understand the counterargument.

  9. Or, you could go to GuideStar where you can view non-profits IRS Form 990’s and other financial returns that show most if not all of the financial activities, payroll, etc.  That is what I do, instead of ordering the full DVD set.

Comments are closed.