Rogue archivist Carl Malamud sez,
Read the rest
On November 1, Public.Resource.Org released a new service which put 6,461,326 US nonprofit tax returns on the net for bulk download, developers, and search engines to access. We offered to give the working system to the government, and also sent them a few suggestions on ways they could better meet their mission and save themselves a boatload of money. Since then, we've been frantically trying to get the government's attention to take decisive action, but to no avail.
The way the government makes the nonprofit tax returns available to the public is broken in many ways. The IRS insists on selling the tax returns as a monthly feed of DVDs costing $2,580 per year. Each month, I get a stack of a dozen DVDs, each one has 60,000 1-page TIFF files on it. This is just so lacking in clue, and even simple suggestions like using Dropbox instead of mailing us DVDs have been ignored.
In terms of breakage though, the truly big problem is the deliberate dumbing down of tax returns for large nonprofits in order to avoid what an IRS official actually said to us would be "too much transparency." All the big nonprofits have to e-file their tax returns. E-filing means they submit actual machine-processable data encoded in XML.
The way the IRS releases that information is mind-boggling. They image the data onto tax forms and then release them as 200 dot per inch TIFF files. So, instead of having a computer program extract the gross revenue, or the CEO salaries, or whether or not the nonprofit operates a tanning salon on premises (an actual question on the form!), you get something that is so bad that OCR is difficult.