Update on the Boing Boing post release for your weekend project


16 Responses to “Update on the Boing Boing post release for your weekend project”

  1. Dave Faris says:

    Woot! I have a busy weekend ahead of me!

    1. I will unmash all the posts about mashups.
    2. I will steam clean all the posts about steampunk.
    3. I will revert all the subway posters back to indicating their original subway station names.
    4. I will unmake all the posts about makers.
    5. And I will just look at all the posts about bananas.
  2. Anonymous says:

    Haven’t done any cross-browser tests, but here’s something:

    BoingBoing Textagrabamorgraphier
    Gives you the reading ease, grade-level reading score, and words-per-sentence on average per month per author.

    BoingBoing Mapathaonalonagon
    A clickable google map with graph overlay – click anywhere in the world and see the number of posts that mention the placename, with handy hover-over links.

  3. hpnsack says:

    gonna write a brown noise generator app that will cause something to shit on each of ya for everytime you have typed the word ‘steampunk’

  4. Cowicide says:

    You are planning to make something cool with the last 11 years of Boing Boing posts, right?

    Yes, I will make beautiful love with the posts. I will gently caress the posts with my rugged hands and bend them slightly to my will and whisper from my lips to the posts sweet nothings and dirty somethings.

    I’ll then slowly move my pouting lips closer and closer to the posts until we can’t take it anymore and we engage in a passionate, sloppy embrace. Afterwards, I’ll pull open a hatch hidden under the carpet and quickly rip out an old, love-stained gimp outfit.

    I’ll put that gimp outfit on you, posts. I’ll then force you posts on yours knees and berate the bejesus out of you… you filthy posts.

  5. lysdexia says:

    In case you cant tell I’m quite taken with couchdb.

    Should one want to replicate this data to another system, take a look here:


    If you already have couchdb-python installed, simply do:

    couchdb-replicate http://kphretiq.com:5984/boingboing http://localhost:5984

    I should have more goodies up tomorrow. For some reason my wife feels that tuckpointing the basement before the spring thaw takes precedence over mucking about with a bunch of old Boing Boing posts.

  6. Oskar says:

    I just downloaded the XML file a second ago, and it’s not parsing for me. I tried the dom parser and the sax parser in python and both of them failed to parse. They complained specifically about line 666 (THE XML-LINE OF THE DEVIL!), said that there was an invalid token there.

    I’ll probably just use the json instead, but it would’ve been nicer to use the XML one. Oh well :)

    • awjtawjt says:

      Oskar, same for me. (I was having SAX with the data) and there is a BUNCH of extraneous formatting in the file – it needs to be cleaned. Also, the tag data is html, so that needs to be parsed as well. Apparently fluidDB has done this, or is in process, so I’d just use that data if it’s available and is true to the original.

      People should post up what questions they want answered. I’m most curious about the basic stuff. readership – what are the major categories that receive the most comments? (and the further question: are the # of comments a good proxy for click-through rates?).

      I’m also curious about the subgroup analysis, where certain topics among the post authors are more popular if a certain poster posts it rather than if another one does! Like, for instance, Xeni used to post sexbot links all the time… and now… we’ve been de-denuded on that “front.” Oh well, times they are a-changin.

      Anyways, what other questions are people looking to answer with this data? I’m curious what people intend to do.

      • awjtawjt says:

        “. Also, the tag data is html, so that needs to be parsed as well”

        that should say… the BODY tag data I wrote it with the brackets and it disappeared, thinking that wa s a tag in my post oops

  7. lysdexia says:

    I have set up the data in couchdb at kphretiq.com, with an eye toward a json-based web service.

    Until I have a little something better, you can play with the data here:


    Try fiddling with the groupings. Have fun!

    • lysdexia says:

      (you’ll need to click “reduce” before groupings will work).

    • awjtawjt says:


      can you eliminate the “delete DB” button? I would hate for someone to delete all your hard work… or at least keep a copy somewhere else, so that you can put things back if the worst should happen…

      • lysdexia says:

        Well, unless they somehow get a login, the “Delete” button does not work. You can add stuff to the db however. :-)

        This site is on a VM so if somebody makes a horrible, horrible, innocent mistake, or decides to perpetrate evil, I can just rebuild.

        Thanks for the polite warning, though! I was more worried about “UR so stupid LOL” than actual data munging. :-)

  8. lysdexia says:

    Instructions for use of “BoingBoing posts as JSON service via CouchDB” now available at kphretiq.com

    API is totally caveman at this point, but one can generate some totals and get posts based on some simple criteria.



    . . .gives total articles published, by author.

    . . .gives Cory’s total posts.


    . . .gives all YoungAmerican’s posts on 2009/10/05 between 4pm and 7pm.

    Have fun! I’ll add some more interesting stuff later.

  9. awjtawjt says:

    oh good it doesn’t delete
    It pops a dialog box and I was afraid to actually click delete in case it did

  10. Michael Buckbee says:

    I created a service called “Boingable” that does a Bayesian analysis of BoingBoing posts titles and compares them a post you might submit to BoingBoing for the editors to promote.

    Code and description at: http://www.buzzwordcompliant.net/2011/01/31/launching-boingable/

    Actual Site at: http://www.boingable.com

Leave a Reply