Update on the Boing Boing post release for your weekend project

You are planning to make something cool with the last 11 years of Boing Boing posts, right? Here's a quick update on the release from earlier in the week:

• So far, the XML file I posted last week has been downloaded 2,500 times. Woo! We're very excited to see what you all do with it.

macartisan on Twitter noticed some validation errors in the original XML file, and others of you saw similar issues. Fortunately, ntoll at FluidDB fixed these errors while working with the data. The XML file has been updated so you won't have to worry about wonky characters while parsing it.

• ntoll also converted the file to JSON for those of you who don't want to deal with XML. That file is available for download as well, and has some extra goodies like better category organization and a list of URLs and domains mentioned in each post.

• The FluidDB for Boing Boing has finished parsing. You will now be able to access all 64,000 posts through their API. ntoll is also adding the URL and domain information from the JSON file to the API. He'll be doing a write up with some examples and explanations on how to use the API soon.

If you've got some time this weekend, and want to play around with a huge collection of text, URLs and other interesting information, we'd love to see what you come up with. You can send me your projects directly at dean@boingboing.net or on Twitter.

Eleven years of Boing Boing posts available in [XML], [JSON] and via [FluidDB]


  1. gonna write a brown noise generator app that will cause something to shit on each of ya for everytime you have typed the word ‘steampunk’

  2. You are planning to make something cool with the last 11 years of Boing Boing posts, right?

    Yes, I will make beautiful love with the posts. I will gently caress the posts with my rugged hands and bend them slightly to my will and whisper from my lips to the posts sweet nothings and dirty somethings.

    I’ll then slowly move my pouting lips closer and closer to the posts until we can’t take it anymore and we engage in a passionate, sloppy embrace. Afterwards, I’ll pull open a hatch hidden under the carpet and quickly rip out an old, love-stained gimp outfit.

    I’ll put that gimp outfit on you, posts. I’ll then force you posts on yours knees and berate the bejesus out of you… you filthy posts.

  3. I just downloaded the XML file a second ago, and it’s not parsing for me. I tried the dom parser and the sax parser in python and both of them failed to parse. They complained specifically about line 666 (THE XML-LINE OF THE DEVIL!), said that there was an invalid token there.

    I’ll probably just use the json instead, but it would’ve been nicer to use the XML one. Oh well :)

    1. Oskar, same for me. (I was having SAX with the data) and there is a BUNCH of extraneous formatting in the file – it needs to be cleaned. Also, the tag data is html, so that needs to be parsed as well. Apparently fluidDB has done this, or is in process, so I’d just use that data if it’s available and is true to the original.

      People should post up what questions they want answered. I’m most curious about the basic stuff. readership – what are the major categories that receive the most comments? (and the further question: are the # of comments a good proxy for click-through rates?).

      I’m also curious about the subgroup analysis, where certain topics among the post authors are more popular if a certain poster posts it rather than if another one does! Like, for instance, Xeni used to post sexbot links all the time… and now… we’ve been de-denuded on that “front.” Oh well, times they are a-changin.

      Anyways, what other questions are people looking to answer with this data? I’m curious what people intend to do.

      1. “. Also, the tag data is html, so that needs to be parsed as well”

        that should say… the BODY tag data I wrote it with the brackets and it disappeared, thinking that wa s a tag in my post oops

  4. Woot! I have a busy weekend ahead of me!

    1. I will unmash all the posts about mashups.
    2. I will steam clean all the posts about steampunk.
    3. I will revert all the subway posters back to indicating their original subway station names.
    4. I will unmake all the posts about makers.
    5. And I will just look at all the posts about bananas.
    1. lydsexia,

      can you eliminate the “delete DB” button? I would hate for someone to delete all your hard work… or at least keep a copy somewhere else, so that you can put things back if the worst should happen…

      1. Well, unless they somehow get a login, the “Delete” button does not work. You can add stuff to the db however. :-)

        This site is on a VM so if somebody makes a horrible, horrible, innocent mistake, or decides to perpetrate evil, I can just rebuild.

        Thanks for the polite warning, though! I was more worried about “UR so stupid LOL” than actual data munging. :-)

  5. In case you cant tell I’m quite taken with couchdb.

    Should one want to replicate this data to another system, take a look here:


    If you already have couchdb-python installed, simply do:

    couchdb-replicate http://kphretiq.com:5984/boingboing http://localhost:5984

    I should have more goodies up tomorrow. For some reason my wife feels that tuckpointing the basement before the spring thaw takes precedence over mucking about with a bunch of old Boing Boing posts.

  6. oh good it doesn’t delete
    It pops a dialog box and I was afraid to actually click delete in case it did

  7. Instructions for use of “BoingBoing posts as JSON service via CouchDB” now available at kphretiq.com

    API is totally caveman at this point, but one can generate some totals and get posts based on some simple criteria.



    . . .gives total articles published, by author.

    . . .gives Cory’s total posts.


    . . .gives all YoungAmerican’s posts on 2009/10/05 between 4pm and 7pm.

    Have fun! I’ll add some more interesting stuff later.

  8. Haven’t done any cross-browser tests, but here’s something:

    BoingBoing Textagrabamorgraphier
    Gives you the reading ease, grade-level reading score, and words-per-sentence on average per month per author.

    BoingBoing Mapathaonalonagon
    A clickable google map with graph overlay – click anywhere in the world and see the number of posts that mention the placename, with handy hover-over links.

Comments are closed.