Internet Archive to ignore robots.txt directives

Robots (or spiders, or crawlers) are little computer programs that search engines use to scan and index websites. Robots.txt is a little file placed on webservers to tell search engines what they should and shouldn't index. The Internet Archive isn't a search engine, but has historically obeyed exclusion requests from robots.txt files. But it's changing its mind, because robots.txt is almost always crafted with search engines in mind and rarely reflects the intentions of domain owners when it comes to archiving.

Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes. Internet Archive’s goal is to create complete “snapshots” of web pages, including the duplicate content and the large versions of files. We have also seen an upsurge of the use of robots.txt files to remove entire domains from search engines when they transition from a live web site into a parked domain, which has historically also removed the entire domain from view in the Wayback Machine. In other words, a site goes out of business and then the parked domain is “blocked” from search engines and no one can look at the history of that site in the Wayback Machine anymore. We receive inquiries and complaints on these “disappeared” sites almost daily.

A few months ago we stopped referring to robots.txt files on U.S. government and military web sites for both crawling and displaying web pages (though we respond to removal requests sent to info@archive.org). As we have moved towards broader access it has not caused problems, which we take as a good sign.

Read the rest

Internet Archive: "DRM for the Web is a Bad Idea"

Brewster Kahle, who invented the first two search engines and went on to found and run the Internet Archive has published an open letter describing the problems that the W3C's move to standardize DRM for the web without protecting otherwise legal acts, like archiving, will hurt the open web. Read the rest

UC Berkeley nuked 20,000 Creative Commons lectures, but they're not going away

A ruling about a DC university held that posting course videos to the open web without subtitling them violated the Americans With Disabilities Act (while keeping them private to students did not) (I know: weird), and this prompted UC Berkeley to announce the impending removal of 20,000 open courseware videos from Youtube. Read the rest

Software Heritage: Creating a safe haven for software

Software Heritage is an initiative that has made preserving software its main mission. Why is this important? Consider, if you will, the case of Gary Kildall's video game...

How likely is a future without paper?

Today we travel to a fully digital world, a world where paper is a thing of the past.

Flash Forward: RSS | iTunes | Twitter | Facebook | Web | Patreon

In this episode we talk about how likely it is that we might get rid of paper (not very) and what might happen to our reading habits, memories, and environment if we do.

▹▹ Full show notes Read the rest

Incredible archive of 49 millions artworks, artifacts, books, videos, and sounds from across Europe

Josh Jones of Open Culture says, "Of all the archives I’ve surveyed, used in my own research, and presented to Open Culture readers, none has seemed to me vaster than Europeana Collections, a portal of '48,796,394 artworks, artefacts, books, videos and sounds from across Europe,' sourced from well over 100 institutions such as The European Library, Europhoto, the National Library of Finland, University College Dublin, Museo Galileo, and many, many more, including contributions from the public at large."

Image: Museu Nacional D'Art De Catalunya CC BY-NC-ND Read the rest

500 hours of random VHS recordings condensed to five minutes

Bob Jaroc used to obsessive record long stretches of random TV on VHS cassettes. Read the rest

14,000 drawings of the French Revolution posted online

Guillotines and numbing satire figure strongly in an archive of images from the French Revolution, made available by Stanford University and the Bibliothèque nationale de France

About 14,000 high-resolution images are in the set, which is divided into Parliamentary Archives and Images of the French Revolution and neatly organized by event and category. [via Hyperallergic] Read the rest

Play a digital version of a lost "perception-altering" Freemasonry board-game

Jason writes, "'The Bafflement Fires' is a digital recreation of a Freemason board game from the 1950s." Read the rest

We have a memory problem

Video games have an issue with memory. Sometimes development and culture within the medium ends up locked in obeisance to nostalgia, an assumed audience of pixel art and chiptune fans who really just want a Final Fantasy VII remake or yet another Legend of Zelda. At other times it's like we can't remember the past five years of history, routinely hailing "firsts" that have certainly been done before, or treating well-trod debates as if they were new conversations each time they simmer to the surface again.

According to the Internet Archive's Jason Scott, much of games' history risks being lost to the winds. So does a lot of writing and criticism -- a lot of us former contributors to storied magazine Edge just found out a lot of our online content has simply disappeared in the latest migration.

And as much as folks like me lament the lack of women's voices in games, or the absence of mainstream interest in games as a sophisticated form, the accomplished J.C. Herz was writing sophisticated game columns in the New York Times just 15 years ago, and I never even knew til recently. She also, like me, wrote a memoir of the 90s internet during the 1990s, not unlike my own recounting published just a couple years ago. We also have the same big curly hair. Anyone's time paradox gone missing?

"Older software is hard to get to,” Scott said. Today, game developers could very well be throwing away history. "The thing about game and computer history is that it's both adored and ignored," he added.
Read the rest

No future for you: cultural institutions can't afford to play along with pointy-headed bosses

My new Guardian column, Go digital by all means, but don't bring the venture capitalists in to do it, is an open letter to the poor bastards who run public institutions, asking them to hold firm on delivering public value and not falling into the trap of running public services "like a business." Read the rest

Library of Congress wants your Halloween/Dia de los Muertos/All Saints Day/All Souls Day photos

Trevor from the Library of Congress writes, "The American Folklife Center at the LOC is inviting Americans participating in holidays at the end of October and early November to photograph hayrides, haunted houses, parades, trick-or-treating and other celebratory and commemorative activities to contribute to a new collection documenting contemporary folklife." Read the rest

Get 2600's archives from 1987

Emmanuel Goldstein from 2600 Magazine writes, "Volume 4 of The Hacker Digest has been put into PDF format, comprised of issues of 2600 Magazine from 1987."

This was the first year that 2600 adopted the digest format. For the first time ever, a hacker magazine would show up on newsstands and in bookstores around the world. New concepts such as cellular phone fraud and electronic mailboxes for $20 a month were introduced to the public and scrutinized in the pages of 2600, while traditions like the letters section, payphone photos, and 2600 meetings were in their infancy. The hacker spirit from these early issues is remarkably similar to that of today: defiant, curious, and overflowing with data.

VOLUME 4 OF THE HACKER DIGEST RELEASED ALONG WITH DETAILS ON ITS HISTORY

(Thanks, Emmanuel!) Read the rest

As Office of US Courts withdraws records for five top benches, can we make them open?

Rogue archivist Carl Malamud writes, "The Administrative Office of the U.S. Courts has announced that they are removing the archives for 5 important courts from their infamous PACER system. PACER is the ten-cent-per-page access to U.S. District and Appeals courts dockets and opinions." Read the rest

How It Works …. The Computer (Ladybird books, 1978)

I found a copy of one of my favorite childhood books about computers. And now you can enjoy it too!

Vatican digitizing archives

Some poor devil has to scan in thousands of handwritten documents over the next four years—it's no wonder bags of cocaine are being intercepted by foreign customs on the way there. Read the rest

Discordian archive rescued from dumpster, now online

Groucho Gandhi writes, "'Crackpot Historian' Adam Gorightly (the current Keeper of the Sacred Chao) saved the archives of Discordian co-founder Greg "Malaclypse the Younger" Hill from the literal dustbin of history by swooping up the Hill archives as they were about to be tossed in the dumpster. Srsly!

"Why is this important? Greg Hill (thru Discordianism) created the first proto-zine, The Principia Discordia, and the precursor to the Creative Commons licensing scheme, first known as KopyLeft, All Rites Reversed. Now Adam Gorightly is taking the Discordian Archives and releasing them."

Historia Discordia (Thanks, Groucho!) Read the rest

More posts