An Indian research university has assembled 73 million journal articles (without permission) and is offering the archive for unfettered scientific text-mining

The JNU Data Depot is a joint project between rogue archivist Carl Malamud (previously), bioinformatician Andrew Lynn, and a research team from New Delhi's Jawaharlal Nehru University: together, they have assembled 73 million journal articles from 1847 to the present day and put them into an airgapped respository that they're offering to noncommercial third parties who want to perform textual analysis on them to "pull out insights without actually reading the text." Read the rest

Text-mining journalists find that lawmakers introduced 10,000 bills that were copypasted from lobbyists' "model legislation"

For two years, researchers from USA Today, The Arizona Republic and the Center for Public Integrity have been ingesting the bills introduced in all 50 state legislatures, yielding a corpus of more than 1,000,000 bills, and then consumed months of computer time on a large cluster, comparing these bills to "model legislation" promoted by lobbyists, using a text-mining engine that could identify paraphrases, synonyms, and other techniques used to file the serial numbers off of these bills. Read the rest

Trump only writes the angry tweets, the nice ones are written by a staffer with an Iphone

On August 6, artist Todd Vaziri observed that all of Trump's angry tweets come from the Twitter client for Android, while the more presidential, less batshit ones come from an Iphone; Vaziri speculated that the latter were sent by a staffer. Read the rest