Get a copy of the Web

Want 80 terabytes of web-crawl? The Internet Archive will give you a copy of (an appreciable slice of) the Web, for research purposes: "we would like to experiment with offering access to one of our crawls from 2011 with about 80 terabytes of WARC files containing captures of about 2.7 billion URIs. The files contain text content and any media that we were able to capture, including images, flash, videos, etc."
Discuss

14 Responses to “Get a copy of the Web”

  1. So that’s like 79 terabytes of porn and some misc. news sites right?

  2. Jake0748 says:

    It reminds me of the (really) old Dilbert cartoon where the pointy-hair boss asks Dilbert if he would download a copy of the internet and put it on a floppy for him. 

  3. howaboutthisdangit says:

    It’s one thing to have an offline copy of a website or wiki, but this….

    I’ve got about 600G available on my primary drive.  I wonder if I can free up another 79.5T?

  4. nowimnothing says:

    Is it weird that 79 terabytes seems small? Maybe the whole 1999 web would fit on that, but 2008 with photo-sharing websites?
    I have a 2T media server and I am working on fitting my dvd library on it. I know it will run out of room before I am done.

    • Tom Hunter says:

      Well, not everyone can be as special as you to have over 2 TB of movies…98% of which you will never watch again. 

      • Matthew11 says:

        Nope, most of us ave 10+TB of movies, I’m currently trying to convert my entire collection of blu-ray’s, DVD’s as well as digital media into a more easy to access form, a signle 3D movie is pretty big in a lossless format so it doesn’t take much even paring it down to the cream of the collection.

      • nowimnothing says:

        That is only about 200 uncompressed dual layer DVDs. Start throwing in 30 disc TV series like Kids in the Hall and Monty Python (pretty sure I will watch those again) and that 2T goes fast.

  5. Bruce Miller says:

    . . . and the point is?

Leave a Reply