The folks at the Allen Institute for Artificial Intelligence just released an intriguing tool — "Paper to HTML", which lets you upload a scientific paper and it turns it into an HTML web page.
The goal, as they wrote in their email, is to improve accessibility: Screen-readers and accessibility tech usually finds it a lot easier to parse HTML than PDFs:
This week, a team of researchers and engineers led by Lucy Lu Wang released a prototype of their tool that converts scientific PDFs into HTML, making them readable by screen readers and much more easily visible on mobile devices. After learning that fewer than 3% of scientific papers meet minimum criteria for accessibility, AI2 is pursuing new and better ways to make scientific publishing accessible to the broadest possible audience
I uploaded a scientific paper I was recently reading and damn, the tool did a bang-up job. That's a screenshot of the HTML generated above.
This will also make it easier for me to clip stuff from PDFs. Right now, most of the time when I cut and paste from a PDF paper into, say, Google Docs or Word, the text is all chopped up with line-breaks. But this tool renders all the text as single <p>'s of HTML, which cut and paste as a complete bolus of text. I'm in!