Simon writes, "I recently got a chance to interview and profile the people behind a collaboration between Smithsonian and the Harvard College Observatory who are crowdsourcing the transcription of logbooks for thousands of photographic plates. It's a massive undertaking that will give scientists access to a hundred years of astronomical data."
The project, called Digital Access to a Sky Century at Harvard (DASCH for short), is actually a collaboration between the Harvard College Observatory and the Smithsonian Institution, the latter of which has embarked on a much larger endeavor to crowdsource the transcription of millions of pages of archival material. The DC-based network of museums launched a beta version of its crowdsourcing platform in June 2013, and over the next year about 1,000 volunteers transcribed 13,000 pages of documents. After emerging from beta in August and opening up to the public, that volunteer list has swelled to over 4,000. Though it isn't the first archival institution to begin digitizing its collected works — everyone from the New York Public Library to the British Museum have launched similar initiatives — its scope of work is arguably the most massive; there are 137 million objects spread across the Smithsonian's 19 museums.
But in a world where Google has the technology to scan millions of library books and provide searchable text, why can't the Smithsonian do the same? "Basically these documents are not able to be recognized by [optical character recognition] text readers," said Sarah Sulick, a public affairs specialist at Smithsonian. "You feed it through a scanner and it can read all the script on the page, but for these handwritten documents it just can't do that. We actually need a human for this to happen."