Tell the IRS that mountains of DVDs are a stupid way to distribute public records

Rogue archivist Carl Malamud sez, "I just finished ripping 30 DVDs from the IRS. This is the monthly feed of nonprofit tax returns. I now have 7,442,564 of these returns spinning on the net. I've had it. This year, the IRS upped the cost of this feed to $2910. I've already spent $16,137 on this brain dead format. For 2 years, I've been writing to the IRS to suggest better ways. Dropbox anybody? An FTP server?"

Read the rest

IRS database of nonprofits is filled with unredacted SSNs


Remember when rogue archivist Carl Malamud asked the IRS for data on $1.5 trillion worth of data from nonprofit organizations? Well, it turns out that the IRS has totally failed to redact it properly, and left in the Social Security Numbers for thousands of people. So they've asked the IRS to take the database down and get it right. He explains:

Public.Resource.Org has issued a statement explaining why we asked the I.R.S. to temporarily take their political money database off the Internet and why they complied with our request. This database is a vital tool for researchers and we apologize to those of you that use this database on a daily basis.

This is only one of several exempt organization databases that the IRS has totally bungled. They've become addicted to bad Internet hygiene and it is time now for the Service to admit it needs help.

We deserve better for the public filings of exempt organizations, a category that makes up 10% of US wages and over $1.5 trillion in economic activity. Let's hope the administration takes this seriously and sends in the A team.

Why We Asked the I.R.S. to Temporarily Turn the Lights Off on Section 527 Data (Thanks, Carl!)

Public Resource wants to liberate tax records for US nonprofits - converting 100lbs of scanned bitmaps on DVDs into searchable data on $1.5T worth of activity


Rogue archivist Carl Malamud sez,

On November 1, Public.Resource.Org released a new service which put 6,461,326 US nonprofit tax returns on the net for bulk download, developers, and search engines to access. We offered to give the working system to the government, and also sent them a few suggestions on ways they could better meet their mission and save themselves a boatload of money. Since then, we've been frantically trying to get the government's attention to take decisive action, but to no avail.

The way the government makes the nonprofit tax returns available to the public is broken in many ways. The IRS insists on selling the tax returns as a monthly feed of DVDs costing $2,580 per year. Each month, I get a stack of a dozen DVDs, each one has 60,000 1-page TIFF files on it. This is just so lacking in clue, and even simple suggestions like using Dropbox instead of mailing us DVDs have been ignored.

In terms of breakage though, the truly big problem is the deliberate dumbing down of tax returns for large nonprofits in order to avoid what an IRS official actually said to us would be "too much transparency." All the big nonprofits have to e-file their tax returns. E-filing means they submit actual machine-processable data encoded in XML.

The way the IRS releases that information is mind-boggling. They image the data onto tax forms and then release them as 200 dot per inch TIFF files. So, instead of having a computer program extract the gross revenue, or the CEO salaries, or whether or not the nonprofit operates a tanning salon on premises (an actual question on the form!), you get something that is so bad that OCR is difficult. Nonprofits are a $1.5 trillion chunk of the U.S. economy, yet we're deliberately dumbing down data that could make that sector more efficient and more vibrant. That's dumb.

Since November, we've been trying to get the IRS and the Obama Administration to release this information, but they've refused. We've met with all sorts of IRS officials such as Lois Lerner and Joseph Grant of Tea Party fame, and we've also met with a ton of boldface names in the White House, such as Todd Park (the President's CTO) and Steve VanRoekel (the Federal CIO). Nobody will release the data. The IRS is worried the big nonprofits will be upset if information such as multimillion-dollar CEO salaries is more readily available.

Since discussion hasn't worked so far, we've retained the services of Thomas R. Burke, an eminent First Amendment attorney at Davis Wright Tremaine and he's been working with our own counselor David Halperin. Today, they filed suit in the U.S. District Court for the Northern District of California. One reason we picked the Northern District because they have a requirement that the parties try and work out their problems out of court using what is known as Alternative Dispute Resolution (ADR), which includes techniques such as mediation and arbitration. The ADR rules in this District Court require each party to bring to the mediation an official who has the authority to resolve this issue.

So, I'm reaching out to my good friends Todd Park and Steve VanRoekel, the architects of the President's great new machine-processable data directive, and I'm personally asking them to help us resolve this dispute with the administration. We're all on the same side here, let's work this out and get on with the real job at hand!

Links:
Our complaint in district court
Copies of our letters back and forth to the White House and the IRS
Sunlight Foundation: Nonprofit E-file Data Should Be Open
Think Progress: How the IRS Could Make it Easier to Track Dark Money, Right Now
Forbes: IRS: Turn Over a New Leaf, Open Up Data

The longest-running open source project: US Federal Depository libraries

200102744_6973023d9b.jpg The Federal Depository library Program (FDLP) is a geographically dispersed network of 1250 libraries around the US who for over 150 years have worked with the Government Printing Office (GPO) to insure that government information is deposited in local libraries and freely available to everyone. FDLP libraries have also assured the authenticity of government information through this distributed system. Documents librarians have long been advocates for government transparency, freedom of information, privacy and civil liberties (freedom to read etc).

Read the rest