Virtual monkeys recreate Shakespeare

Jesse Anderson set out to recreate every single work of Shakespeare at the same time by means of virtual monkeys that are simulated on Amazon's cloud computing platform. One million virtual monkeys create virtual text around the clock, and if any of that text matches any of Shakespeare, it is saved to the repository.

On September 23d, the monkeys recreated A Lover's Complaint.

For this project, I used Hadoop, Amazon EC2, and Ubuntu Linux. Since I don’t have real monkeys, I have to create fake Amazonian Map Monkeys. The Map Monkeys create random data in ASCII between a and z. It uses Sean Luke’s Mersenne Twister to make sure I have fast, random, well behaved monkeys. Once the monkey’s output is mapped, it is passed to the reducer which runs the characters through a Bloom Field membership test. If the monkey output passes the membership test, the Shakespearean works are checked using a string comparison. If that passes, a genius monkey has written 9 characters of Shakespeare. The source material is all of Shakespeare’s works as taken from Project Gutenberg.

(via /.)


  1. I’m tempted to make a joke about how few monkeys it would take to recreate the works of certain other authors I could name but I think I’ll pass.

  2. What a waste of human effort and electrical power.  There is no shortage of real problems to be solved in the world.  Go fix one.

    1. why thinkest thou that this does not pertain *
      to real problems, such as whence *
      the spark of intellect – mine, hers – and mayhap yours *
      doth stem?

      now fie! we’ll have no more *
      of thine insipid slandering *
      and this fine gentleman’s vocation *
      – so blessful and so blissful –  shall continue.

    2. I could say the same about your post.  You clearly know what needs to be done.  How dare you criticize people on Boingboing instead of fixing it?

    3. I was thinking along the same lines, though not as coarsely. There’s nothing wrong with filling your “bored” time with such trivial pursuits as virtual monkey projects or say…commenting on Boing Boing posts…as long as you spend some of your waking hours in noble deeds as well! Carry on…

  3. richard dawkins once did something like this in an 80ies tv-documentary (“the blind watchmaker”, it’s on youtube) , where he had 2 ai’s in contest: one adaptively finding letters “more similiar” to the line “me thinks mylord a weasel”, the other just randomly  firing up letters until the phrase came up coincidently. thus explaining that evolution does not equal chance.

  4. any time 9 characters together are in shakesphere is the qualifying target?  That seems very generous.

    I do believe they are supposed to reproduce the work as a whole, not reproduce the work 9 characters at a time and have an intelligent agent stitch it together.  Defeats the whole purpose.

    1. That was my first thought. He’s cheating by removing word order as a factor in
      the monkey’s creation. So if one monkey comes up with “cleopatra” it’s going to
      count as 28 hits?

  5. I think it’s officially time to bury that saying. Even with an infinite amount of time it is not feasible that an exact copy of Willie the Shakes could be created much less a work as intriguing. I will now type a sentence never before spoken or written in the history of this planet:
    “Jilly’s clomb-bytes overbit on a heard of fradulapods smoking jaundiced pepperlips on-board the beleaguered ship of Xeni.”
    A connection of words is no small thing. Give Willie the respect he earned.

    1. “Give Willie the respect he earned.”
      I agree with the spirit of your comment, but I think you also should give some respect to infinity.  Regardless of how good Willie was (is), infinity is enough time to make more. :)

    2. I think you’re missing the point of the saying.  It means that even genius can arise from chaos.  Shakespeare isn’t a master because he write Macbeth.  Shakespeare is a master because it only took him one shot.

      1. But I agree. Genius did arise from chaos. Imagine the world Shakes lived in and the art he was able to pull out of it. I think great art only comes from that chaos and the infinity of brain synapses in action and under stress.

        Speaking of a better use of time… I would much rather enjoy the work that those millionmonkeys could create on their own rather than just copying a master work previously conceived.
        Free The Million Monkeys! They have their own work to create!

      1. Methinks infinity only exists in theory.. as with amillionmonkeys and deities. In practice all things and times have limitations. Zero is an infinitely large number.

    3. I’m pretty sure you’re wrong about that.  In an infinite amount of time, absolutely every possible combination of text would be generated, and of course this would include the entire works of Shakespeare.  In face, it would include an infinite number of variations of the complete works of Shakespeare, an infinite number of which would would be close enough of a match to qualify (for example, there would be some versions with the works in one order, others in another, etc.)

    4. Every time I say “foo” I am saying that sentence in a made-up language, because in my made-up language, “foo” stands for everything which has been and will ever be said.

  6. Sniffy sourpusses leave impotent critiques in blog comboxes while real dudes do stuff that sourpusses notice.

  7. Meh, this is cheating.  I think the idea is that they need to recreate it all in one go, not 9 characters at a time. 

  8. @ObstacleMan:disqus and @boingboing-13c5761fbc4d10bc361221c281f84190:disqus : That’s exactly the point. There’s a HUGE amount of information contained in the arrangement of those 9-character chunks, all of which is being contributed by the primary text.
    If the process used variable-length words instead of 9-character chunks (which would be substantially similar) the story could be less misleadingly and more modestly titled “Virtual monkeys recreate all words used by Shakespeare.” Unfortunately, the 9-character thing is just complex enough that it’s hard to make the writing on the tin match what’s inside. I presume that 9 characters was picked as the longest group length which would finish running in reasonable time; anyway, it represents just how far away from actual infinity we are, here.

    Virtual monkeys < Shakespeare < Infinity, with enormous distances between the values.

    The presentation (and self-presentation) of this as a "recreation" of any primary text is an unfortunate triumph of cute over thought and may fuzz people's intuition about the extraordinary degree to which writing is anti-entropy.

    Library-of-babel-fail :(

    1. Reading the article and watching the video, it doesn’t seem to me that his point is to illustrate the amount of information in 9 character chunks.   He states he’s trying to emulate the original saying, but because he doesn’t have infinite resources, he’s going to do it in 9 letter segments, with no punctuation or capitalization.   Meh. If my “resources” was a few old Apple IIe’s, I could have chosen 1 or 2 character segments and made the same claims.  Both cases simplify the original idea to the level of irrelevance. I just don’t see the point: matching segments is not anything even remotely near what the saying implies.   The line in the article saying “This is the first time a work of Shakespeare has actually been randomly reproduced.” is particularly misleading.  

      A far more accurate  story title would be “virtual monkeys recreate tiny snippets of Shakespeare, which are then rearranged to produce his work”.

      I hate to be one of THOSE GUYS, but the misleading nature of this prompts me to say that running some sort of Folding/Rosetta/BOINC/etc would be a better use of that same computer power.

      1. I totally agree–the information point was just my (abstruse) way of getting at the problem here.

        If a programmer wants to show off some tricks with Hadoop, more power to her. The Bloom Filter approach which Anderson used is actually kind of cool–a waste of cycles in this case, but as a dry run for some meaningful project, sure!

        The problem is how it’s been reported by sites such as Boing Boing. Folks who should know better but who still repeat the “recreated” claim, as if the alphabet hasn’t always already recreated every possible text in exactly the same sense–THAT, I have a problem with. 

        My guess is that it’s a subconscious refusal to understand–probably because, on some level, many BB readers/editors {myself included!} emotionally want this story to be meaningful, since it would be Really Cool if random processes could actually come anywhere near the useful-information level of a Shakespeare play (which would almost certainly take quantum or biological computing, and which in any case couldn’t get you past the Library of Babel selection problem.)

        Cory/BB should do a retraction/clarification of this story, no question.

  9. This is a great demonstration of the power of distributed computing. Sure, the guy didn’t cure cancer, but his demonstration is likely to inspire others to use his methods  towards solving more altruistic problems.

    Now if we could only harness the half billion primates clicking away on Facebook for good.

    1. This is a demonstration of the *limits* of distributed computing. Put differently:

      arrows oto sleep,stion:
      Wh troublese, that i this mor sleep ofTo die, t,
      When we may comend
      The henatural s sleep toous fortu a sea of say we ehere’s thf outrageTo be, or wished. e rub:
      Devouart-ache,: ’tis a hocks
      Tha not to bo sleep;
      To sleep,r in thatm – ay, tether ‘tine,
      Or to take armfer
      The ss against and the
      No more; and by aconsummat,
      And by opposing thousand tal coil,ffled offlings and have shu
      Must giv death whnd to sufat dreamss heir tos nobler e to dreae us paustly to be To die, t
      flesh iend them?in the mi perchancs the que

      Congratulations, monkeys! (This text would have been equally valid to the experiment–and it STILL contains an immense contribution from Shakespeare.)

      ===Roll your own: =====================================

      s = “””To be, or not to be, that is the question:
      Whether ’tis nobler in the mind to suffer
      The slings and arrows of outrageous fortune,
      Or to take arms against a sea of troubles,
      And by opposing end them? To die, to sleep,
      No more; and by a sleep to say we end
      The heart-ache, and the thousand natural shocks
      That flesh is heir to: ’tis a consummation
      Devoutly to be wished. To die, to sleep;
      To sleep, perchance to dream – ay, there’s the rub:
      For in that sleep of death what dreams may come,
      When we have shuffled off this mortal coil,
      Must give us pause – “””

      s_nines = []

      for x in range(0, int(len(s)/9)): s_nines.append(s[x*9:x*9+9])

      import random


  10. Waaaaaaaait a minute. This has been even more misrepresented than I thought. He’s just creating all 26**9  = 5,429,503,678,976 possible 9-character sequences of letters:

    #1: aaaaaaaaa
    #2: aaaaaaaab

    and then finding those in his source work. He’s now output all 5.5 trillion–which means that, by his standards of what an English ‘text’ is (no capitalization, no punctuation, no whitespace, no newlines), his monkeys have now ‘recreated’ ALL TEXTS IN ENGLISH. They did a *good job.*

    This approach recreates texts the way an alphabet recreates texts. (In fact, why not just pick a one-character group length? Woulda been a lot quicker…)

    This is a nice programming exercise which has been hyped beyond any recognition. Sorry.

  11. @boingboing-c184cf3f84122a8648c052bfbded9949:disqus is quite right.  And furthermore, Jesse is ignoring spacing and punctuation (which Shakespeare certainly did use), which would make it many times harder.  And the next “refinement” of this we could expect from Jesse would be to do it 10-characters at a time, and it would take 26 times as much effort.  But that would be far more stupid than just doing this one character at a time – and a little computer could do that in hardly any time at all….

  12. @bruce, If the monkeys got digital, they could just randomly type until they got “0” and “1”, then go out to eat bananas, and thus by this logic recreate any multimedia experience of any kind, now or in the future, in any format even those yet to be created!  ….given the right source material and search algorithm….

  13. Woah, no one on the realized that the beloved Monochrom were doing this 6 years ago?
    Here is their experiment:

  14. I don’t understand why he is using Sean Luke rewrite of the Mersenne Twister in Java; I would think that the original MT (that is written in C and also optimized for SSE ) would be better: in my tests , it does generate a 64bit pseudorandom number in a few nano seconds….

  15. A monkey writing Shakespeare? I thought that was the plot of Anonymous. (Review on my blog, linked in my profile.) That aside, arkle has pretty much nailed it.

    — James Ph. Kotsybar

    The keyboard monkeys, ad infinitum, 
    may randomly type out a Shakespeare play,
    but publishing monkeys may well spite ‘em, 
    printing only the pap that sells today. 

    The chance of brilliance, vanishingly small,
    is filtered through some monkey business sense 
    and may never make the bookshelves at all — 
    some monkeys see only dollars and cents. 

    A universe of monkey time and space 
    might type a script, enduring and concise, 
    which only may reach its true, valued place
    if editor apes judge it worth the price. 

    Eternity brings this to fruition: 
    “Thank you for sending us your submission…”

Comments are closed.