Tell the IRS that mountains of DVDs are a stupid way to distribute public records

Rogue archivist Carl Malamud sez, "I just finished ripping 30 DVDs from the IRS. This is the monthly feed of nonprofit tax returns. I now have 7,442,564 of these returns spinning on the net. I've had it. This year, the IRS upped the cost of this feed to $2910. I've already spent $16,137 on this brain dead format. For 2 years, I've been writing to the IRS to suggest better ways. Dropbox anybody? An FTP server?" Read the rest

IRS database of nonprofits is filled with unredacted SSNs

Remember when rogue archivist Carl Malamud asked the IRS for data on $1.5 trillion worth of data from nonprofit organizations? Well, it turns out that the IRS has totally failed to redact it properly, and left in the Social Security Numbers for thousands of people. So they've asked the IRS to take the database down and get it right. He explains:

Public.Resource.Org has issued a statement explaining why we asked the I.R.S. to temporarily take their political money database off the Internet and why they complied with our request. This database is a vital tool for researchers and we apologize to those of you that use this database on a daily basis.

This is only one of several exempt organization databases that the IRS has totally bungled. They've become addicted to bad Internet hygiene and it is time now for the Service to admit it needs help.

We deserve better for the public filings of exempt organizations, a category that makes up 10% of US wages and over $1.5 trillion in economic activity. Let's hope the administration takes this seriously and sends in the A team.

Why We Asked the I.R.S. to Temporarily Turn the Lights Off on Section 527 Data (Thanks, Carl!) Read the rest

Public Resource wants to liberate tax records for US nonprofits - converting 100lbs of scanned bitmaps on DVDs into searchable data on $1.5T worth of activity

Rogue archivist Carl Malamud sez,

On November 1, Public.Resource.Org released a new service which put 6,461,326 US nonprofit tax returns on the net for bulk download, developers, and search engines to access. We offered to give the working system to the government, and also sent them a few suggestions on ways they could better meet their mission and save themselves a boatload of money. Since then, we've been frantically trying to get the government's attention to take decisive action, but to no avail.

The way the government makes the nonprofit tax returns available to the public is broken in many ways. The IRS insists on selling the tax returns as a monthly feed of DVDs costing $2,580 per year. Each month, I get a stack of a dozen DVDs, each one has 60,000 1-page TIFF files on it. This is just so lacking in clue, and even simple suggestions like using Dropbox instead of mailing us DVDs have been ignored.

In terms of breakage though, the truly big problem is the deliberate dumbing down of tax returns for large nonprofits in order to avoid what an IRS official actually said to us would be "too much transparency." All the big nonprofits have to e-file their tax returns. E-filing means they submit actual machine-processable data encoded in XML.

The way the IRS releases that information is mind-boggling. They image the data onto tax forms and then release them as 200 dot per inch TIFF files. So, instead of having a computer program extract the gross revenue, or the CEO salaries, or whether or not the nonprofit operates a tanning salon on premises (an actual question on the form!), you get something that is so bad that OCR is difficult.

Read the rest

The longest-running open source project: US Federal Depository libraries

The Federal Depository library Program (FDLP) is a geographically dispersed network of 1250 libraries around the US who for over 150 years have worked with the Government Printing Office (GPO) to insure that government information is deposited in local libraries and freely available to everyone. FDLP libraries have also assured the authenticity of government information through this distributed system. Documents librarians have long been advocates for government transparency, freedom of information, privacy and civil liberties (freedom to read etc). Read the rest