Features Podcasts Family Video Comics Music Tech Science Books Film & TV Games ✚

Jill

BBC to delete 172 unarchived sites, geek saves them for $3.99

Cory Doctorow at 5:09 am Thu, Feb 10, 2011

— FEATURED —

Book Review

Black Code: how spies, cops and crims are making cyberspace unfit for human habitation

Book Review

We Can Fix it! - a graphic novel time travel memoir

Science

The technology that links taxonomy and Star Trek

— FOLLOW US —

Boing Boing is on Twitter and Facebook. Subscribe to our RSS feed or daily email.

 

— POLICIES —

Except where indicated, Boing Boing is licensed under a Creative Commons License permitting non-commercial sharing with attribution

 

— FONTS —

Tweet
Kindle
The TV people are winning at the BBC, and so they're gutting their online budget, shutting down 172 websites. Many of them won't be archived by the Beeb, left to vanish forever.

But one license payer thinks this is stupid. So he spent $3.99 to crawl and make a torrent out of all the threatened BBC websites. You can download the file and keep it alive, preserving the media that our license fees have paid for even though the BBC can't be arsed to do it themselves.

When I found out the BBC would be deleting 172 of its websites, I spidered and downloaded all of the content under each of these top level directories on the bbc.co.uk domain. I purchased a $3.99 'low end box' type VPS server and began the crawl. In total this took just under 24hrs - and would have been quicker if I had been less kind to the BBC's servers. For the aforementioned cost of $3.99 for a cup of Starbucks coffee, anyone can obtain, store and keep this content alive and accessible to the general public. And with this torrent I've already done the heavy lifting of retrieving the data for you.

This $3.99/month box is now hosting the content and making it available both via both the web and via bit torrent.

Clearly the BBC has additional costs associated with its size and scale, compounded due to the poor decision to sell off the organization's technical infrastructure to Siemens from whom it now rents those services back from. But even rounding up those 12 cups of coffee/year to £10,000/year, this still represents negligible budget impact and significant license payer value.

Download the bbc.closing.sites.archive.torrent file

I write books. My latest is a YA science fiction novel called Homeland (it's the sequel to Little Brother). More books: Rapture of the Nerds (a novel, with Charlie Stross); With a Little Help (short stories); and The Great Big Beautiful Tomorrow (novella and nonfic). I speak all over the place and I tweet and tumble, too.

MORE:  Action • Culture • Technology

More at Boing Boing

The technology that links taxonomy and Star Trek

Hackers prepare for first "national holiday" in their honor

  • bibulb

    What, were the websites nothing but Patrick Troughton episodes of Doctor Who?

    You’d think they’d have learned their lessons by this point.

  • artaxerxes

    Thank you, Mecharius. Primary source material is invaluable to future historians. And contemporary judgement of primary source can’t possibly predict what will be of interest or importance 20, 50 or 100 years from now.

    Lack of serious archiving strategy plus the transitory quality of media is a very real and very dangerous threat to future understanding of our times.

  • Anonymous

    The BBC highups are either liars or morons.
    Anyone with a gram of technichal knowledge knows that archiving these sites is trivial and the costs negligable.

    So why are they lying ?

    Who came up with this vast sum for archiving ? Seimens ? Whoever it is is defrauding the license payer.

    It was exactly the same before the iPlayer became Flash, the DG said only 3 Linux users visited the BBC and so there was no need to change from a windows only solution, as I accounted for more than 3 – it was obvious someone was a stranger to the truth.

    And why is the net a second class citizen?
    Why can’t I watch Sky at Night online just after it first airs, but have to wait ages till it last airs before it gets put up.

    And then why can’t I watch previous episodes on the iPlayer? Why does the content expire after a week or 2 ?

    Clearly Murdochian fifth columnists have riddled the management structure. A clean sweep is required – off with the head of the BBC.

  • Anonymous

    If people would like to read the BBC’s side of this story they can find it here:

    http://www.bbc.co.uk/blogs/bbcinternet/2011/01/putting_quality_first_halving.html

    Thanks

    Nick Reynolds

    I am Social Media Executive, BBC Online

  • shome

    damn that’s some money spending for all those BBC content, wonder if there is anything useful

  • dodongo

    Among these saved sites is the hilarious web-only third series of Videogaiden. A video game review show out of Scotland, featuring cameos by Boingboing favorites Tim and Eric.

  • Anonymous

    the anonymous geek behind this is my hero because i love people that DO SOMETHING instead of just whining about how something ought to be done

    we have more resources at our disposal than ever before yet most of us just sit around crying the blues being part of the problem

  • Anonymous

    ITV just as bad….thank goodness for OCD and people like Bob Monkhouse and your good self…

  • Anonymous

    Seems as if Rupert Murdoch destroys every society he comes near. Can’t we ship him off to some uninhabited planet or something?

  • rjek

    The original site keeps fading in and out of accessibility. Mirror here (also on IPv6): http://bt.rjek./com/

  • Anonymous

    This is the same BBC that decided it wasn’t worth keeping episodes of Doctor Who. They’ve got a terrible track record of preserving their products for future generations.

    • larryy

      They also destroyed all the early Goon Shows. Love the Beeb, but responsible archivists they are not.

  • tcforest

    Anon at #30 said…
    “If people would like to read the BBC’s side of this story they can find it here: ”

    It says something that the Managing Editor of BBC Online doesn’t even know what a Top Level Domain is (as noted in the comments on the linked page)

  • Anonymous

    The deletion of the sites is purely political.

    The BBC needs to make visible cuts in places where the British (anti-BBC, mostly Murdoch) press accuse them of providing services that they believe should be provided by private companies.

    If the site doesn’t vanish the press wouldn’t see it as a real cut, would they?

  • Methusedalot

    This is funny. I have always thought the BBC needed to delete or or update a lot of old show pages. I guess I was wrong and now the information is safe for future generations. I mean it will be really important in fifty years that the weekly schedule of series two of Monarch of the Glen is still available, Right?

    • nibor

      The BBC has an internal database called INFAX that stores at least half a century’s worth of TV and Radio meta-data. It is an invaluable resource for those who want to see how language and entertainment tastes evolves in our own lifetimes. We needed a disclaimer due to the use of what we’d now call casual racisim!

      We tried to expose the data to the public via the Programme Catalogue http://www.bbc.co.uk/catalogue but poor technology choices meant it was to costly to develop and maintain.

      It’s interesting to see how this closed trial has been archived and demonstrates how things used to be before iPlayer took over. I didn’t think the link would work but it does and lets me know the status of the project. For a longtime link rot was a massive concern and the corp was proud that a link never died :(

    • GregS

      It’s not just trivia like schedules of Monarch of the Glen that they’re deleting. Read the Adactio post that Cory’s post references. For instance one site slated for deletion is http://www.bbc.co.uk/ww2peopleswar/ “An Archive of World War Two Memories – written by the public”, which has 47,000 stories and 15,000 photographs contributed by the public. That’s a potentially very valuable resource to future historians, especially in a decade or two when most of the generation that lived through that war will be gone. It’s a disgrace that something like that would be discarded just to save a bit of disk space.

      • Methusedalot

        GregS You have got me, that site, about WWII is definitely worth saving, and I am glad it was not scrubbed. I still think this attitude that every shred of data ever collected must be saved because it could be significant in the future is misguided. I offer this counter example from the archive; http://www.bbc.co.uk/bigfinish/
        The argument has been made here by others that everything must be saved, and that future generations will hash through it to determine what has value. Well I think we have the right as citizens of the present to make those value judgments as well. Maybe somethings just belong in the bin. What we choose to call important will tell the future more about us than an unsorted list of webpages.

        • Anonymous

          Should we pick whether to archive everything or choose the best? For $3.50 a month, we can afford to do both.

    • Be_Reasonable

      Actually we have no idea what might be important in 50 years.

      • Methusedalot

        You are right. I can totally see a future where to stop a band of Scottish Separatists from blowing up a train, the government needs this information to recreate the childhood of the Separatist leader. He is kidnapped, drugged, put under hypnoses, and shown old videos and web pages from his youth. Soon he begins to remember and thinks to himself, “My Mum loved this program and back then we had working trains…it is wrong to blow up trains…I must make Mum proud.” Britain is saved! Now I see how this will save lives.

        • Mecharius

          If saving lives is your standard, let’s go ahead and burn most of the National Archives. Of course, pretty much all the university and collegiate archives will go, because who needs it, right? Just a waste of money, right? Who cares that each document is unique and tells us a great deal about the time in which the document was created.

          You may think these sites are useless, but there are future historians who will thank this guy for saving them.

          • Methusedalot

            You missed my point. I really believe this will save lives.

    • Anonymous

      I’m sure it will make researching someone’s future doctoral dissertation less arduous and more thorough.

    • RadioSilence

      who knows what we might find important in the future. that’s why it’s important to archive everything.

      if it’s archived and no use to anyone it’s doing no harm. if, on the other hand, all the data is lost and some of it turns out to be culturally important, then there’s nothing can be done to bring it back.

  • adamnvillani

    This sounds awfully familiar for the BBC — Doctor Who fans all know that many early episodes of the series are unavailable because they erased many of their archive tapes.

    • Anonymous

      Not just the BBC. Re-using tape was common practice because it was expensive and TV was still young enough that no one considered the future value of what was being erased.

    • lasttide

      The same thing almost happened to Monty Python’s Flying Circus until Terry Gilliam (iirc) stepped in and bought the tapes (or reels or whatever). Apparently it was standard policy back then to save a bit of money by reusing tape.

      Lesson to BBC: Cost of storage << cost of producing content

      • nibor

        >save a bit of money

        I spoke with some guys in BBC archiving recently who are currently digitizing analogue tapes from the last couple of decades, a long and tedious job by a very dedicated team.

        When you hear their history of you realize just how hard it was to store fragile tapes for any length of time and how expensive it was to maintain. They have just moved into a new purpose built complex because the old place was leaking and tapes were getting damaged.

        Storage is still a problem, currently the broadcast-quality digitalized files are being put back on to tape as it’s considered the most effective long term storage mechanism. There are an awful lot of tapes!

        • pauldavis

          the data in question here was already digitized. the issues of what to do with old analog (or digital) tape is entirely different to the issue of what to do with some parts of a website.

          • nibor

            True, very different. deleting files of a website seems like the petty cost-cutting activity it is but re-using tapes in the 70s was done after a lot of consideration and apparently wasn’t popular then either.

            People have compared the two strategies already and that’s disingenuous to the parts of the organization who fight very hard to preserve all output.

            If this does happen I only hope that one day the BBC will run another Treasure Hunt asking people to look in their browser caches! Humorously (or not) the Treasure Hunt page is one tagged for deletion

  • Anonymous

    If Siemens is like the the vendors that support my company, it’s $1k for a server a month. The app uses a database? That’s $1k for web server, $1k for db server. God help them if Siemens suggested each site needed an app server.
    So really this could have been
    $172,000 – $516,000 a month
    maybe even more…
    Unless you are one, vendors suck.

  • Deidzoeb

    LOL, Cory, you score bonus points for using the brit expression “even though the BBC can’t be arsed to do it themselves.”

  • fxq

    The BBC tried to erase Flying Circus but the tapes were rescued. However, many awesome brit comedy acts were binned by the BBC and lost forever. They just don’t understand the value of content. I guess that makes the BBC the anti-Disney?

  • Anonymous

    this reminds me of a friend who worked for the CBC. Apparently after Don Messer’s Jubilee show was cancelled at the height of its popularity, they junked all the old episodes, except for a handful that that janitor pulled out of the bin because his wife was one of the dancers on the show. The show ran for over 10years and featured well known musicians from all over Canada and the US. As a fiddle player myself I would love to see some of those old episodes.
    Peter

  • Dr jayus

    I’ve just seen some of the sites that are closing. Like bbc.co.uk/horneandcorden. I’ll pay a few quid a month just to make sure I never accidentally land there ever.

  • chgoliz

    What a hero…and I don’t use that term lightly.

    Medieval monks “recycled” parchment. The resultant layers of erasure can sometimes be made out in parts: it’s called a palimpsest.

    Humans often have no idea the value of what we destroy that could so easily be saved.

    Meanwhile, we have football fields of plastic everywhere because the majority do not believe that recycling is worth the cost and bother.

    • peterbruells

      “Easily”? A parchment equals a goat and a lot of work. It’s not an insignificant piece of data storage.

      And regarding the palimpsest: I doubt it was all cases of „Well, that’s probably the last edition on Earth, but it’s boring anyway”. For all we know, most of them could have existed in dozens copies when one got erased.

    • bcsizemo

      Actually recycling is a bit of a facade in the first place.

      When I place a small storage container that is labeled PP #5 in the recycling bin and they refuse to take it for whatever reason, then the system is broken.

      It’s against the law to throw away plastic bottles, but this container that has 10x the plastic as one bottle is going to the landfill….

      • chgoliz

        That’s part of what I mean: world-wide, there isn’t wholesale commitment to *real* recycling (to say nothing of the first two-thirds of that equation: “reduce” and “reuse”). Some plastics are environmentally harder to make and then to recycle, so putting them in with other plastics gums-up the works.

  • Anonymous

    See? The big society works, yah?