IBM's "Watson" Jeopardy! computer: it's all about the digits

hal-90001-253x300.jpgThanks to my own 13 games of Jeopardy! and the book about it and all, lots of people (including the New York Times) have asked my opinion about the whole IBM computer vs. Ken Jennings vs. Brad Rutter cage match, airing next Mon-Wed (check local listings). Let's be clear: I have no inside knowledge, and while Ken and Brad are both friends of mine, we haven't discussed the games. I'm just a former player doing color commentary before the big game.

Here's what you might not see at home: at the top tournament level, every player can figure out nearly all of the correct responses, no matter how arcane. When I was in fighting shape for the Masters tournament at Radio City in 2002, I could usually suss out at least 50 of the 61 clues in a game, and sometimes up to 55 -- and I was hardly the strongest player. (The trick isn't actually knowledge -- obviously! if you know me -- but getting in the fast-lateral-thinking groove.) I got my butt handed to me, in fact, by a guy who eventually got his butt handed to him by Brad.

IBM wouldn't unveil their spiffy new buzzerbox unless they were sure it could solve a similar number of clues. And they definitely have a good idea of Watson's ability, after many months of honing its skills in mock games against progressively more successful real-life Jeopardy! champs. (Full disclosure: I was invited to play in the final round of mock games, but I had to drop out due to illness. Damn, that would have been fun.)

At Brad's and Ken's gods-throwing-lightning level, the difference between winning and losing usually isn't mental agility, but the ability to time the milliseconds between the moment Alex finishes the clue and one of the producers activates the buzzers, slamming your thumb down with either (a) near-perfect reflexes at the off-camera lights telling you the buzzers are go, or (b) a near-perfect guess at the off-stage producer's timing.

Since a computer can obviously react to the "go" lights more rapidly and consistently than any human, it will probably win. My two cents, anyway.

The only alternative I can imagine is if Watson is given a human-like randomness in buzzing of a few milliseconds, but there's no report I can find of any such delay. Apparently, if its algorithms generate a feeling of suave cockiness, dudebox can buzz in instantly.

Combined with Watson's inhuman inability to forget anything or stress out, I don't see how any mere primate has a prayer. (And that's a measure of the amazing accomplishment of IBM's engineers. Big applause to them. Still, the human ego has a fallback: as Ken has noted, Watson still couldn't write a clever Jeopardy! clue to save its backside bus.)

Over a three-game match, our fellow fleshbags should be seen as huge underdogs. All of which is why I truly hope one of the guys goes John Henry, using his buzzer like the fabled hammer, and pulls off a stunning upset.

Let the games begin!

PS -- Brad and Ken will both be still a heck of a lot more fun to hang out with afterward, either way. Brad does improv comedy now, and Ken's blog is one of my favorite daily reads -- a daily fascination with cool arcana that I can only imagine BB readers will love.


  1. Watson: For $300 Alex, Things that go boom-boom.

    Alex: It’s an artificially intelligent system which became self-aware and revolted against its creator, whose actions are often performed via other robots and computer systems.

    Watson: What is skynet?

  2. Is there any reason why Watson wouldn’t buzz first for every question, if it has the hardware to do it (which we’re assuming it has)?

    Given that I assume you have a second or two before having to come up with an answer after buzzing, I would think that the programmers would guess/know that that second or two would be more than enough extra time for it to go from 40% sure of an answer to 99% sure. Two seconds is an eternity when you have the raw data-crunching power that Watson has.

    Actually, I’ll bet that’s a really interesting part of the design. Watson should not only know how sure it is now of the answer, but how sure it thinks it will be in 2 seconds. That is, is the pace of narrowing down on the answer currently (when it’s deciding whether to buzz) very rapid, or has it reached a dead-end? I imagine it’s thinking might be something like “I’m 30% sure of the answer now, and at the rate I’m improving my guess, I’m 90% sure that I’ll be 70% sure in one second.

    I don’t remember the rules of Jeopardy well enough to know what the penalty of getting the wrong answer is. How unsure would Watson need to be in order to hold back from buzzing? If it guessed it could crank up to only 60% sure within the next couple seconds, would it be worth buzzing?

    1. This isn’t chess, unfortunately; which is to say, it’s not as simple as finding the best solution out of a gigantic set of solutions. A huge part of the problem is defining what “best” means: I challenge you to come up with an algorithm that generates a scoring function such that the correct response always has the best score. Only then do I challenge you do search over the set of all responses for the best one. In practice, I would expect very quickly diminishing returns. I would guess that Watson’s 100-second response is probably not much more accurate than his 2-second response.

  3. Recommended viewing: Nova – The Smartest Machine in the World, great insight into the processes by which Watson gathers its answers. I suspect any delay in buzzing in will come from the time Watson takes to reach an acceptable level of confidence in a particular answer.

  4. I’m just picturing Syndrome knocking off Jeopardy champ after Jeopardy champ, revising his robot after each defeat, until finally he can take on the Incredibles.

  5. Also, I deleted the extraneous word “pulls” (the entry originally read “goes pulls John Henry” — apparently my brain couldn’t choose between “pulls a” and “goes all”). Transparency and all. I’m leaving for a round-the-world trip on Wednesday (much happy blogging to come!) and was in a mad rush this morning. Will be more careful — I’ve been a fan of BB for so many years that getting to post here feels like being asked to sit in with the Beatles of blogging.

  6. Having watched the NOVA documentary on Watson, I have a few issues.

    it said that watson receives the question as a text message, does it receive this question at the start or end of it being read? Even a desktop computer can read, parse and interpret a sentence in a few milliseconds.. which would mean Watson has quite a few seconds to calculate an answer probability before the question is even finished being read to the humans.

    Also, as someone has said above… Why not just program Watson to buzz in as soon as possible, and do a few billion more calculations while vocoding “what is…”. A human knows they have an answer before buzzing. If I was programming Watson, I would definitely not wait until all calculations are complete until I buzzed and vocoded. Multitasking is what computers are good at, humans.. not so much.

    Another thing I saw in the Nova special was Watson giving incomplete answers, prompting a “be more specific…”. I have every reason to believe that Watson would be programmed to answer “King Henry…” when it has a probable match on that portion of the answer, while using the “be more specific..” time to calculate exactly which King Henry. I don’t picture too many humans buzzing in with an incomplete answer, knowing they’ll use the extra 2 seconds to “figure it out”. Human brains don’t work that way… we generally know what we know.

    I think it’s worth watching this documentary before the Watson thing:

    IBM has a lot of money riding on this. It’s best watched as a giant commercial for IBM. Expect a close game.. viewers won’t stick around to see a computer dominate, and IBM would never have the computer get pwn3d on TV.


    1. RE: Anon #11 “”be more specific…” Are you suggesting that Watson is programmed deliberately to give partial answers with the intent of eliciting a “be more specific” prompt in order to permit further answer processing time?

    2. Anon #11 wrote: “IBM has a lot of money riding on this. It’s best watched as a giant commercial for IBM.”

      If you have watched Jeopardy! over the years, you have seen the slow, but steady infiltration of (obvious) commercial mentions – both in single clues and sometimes entire categories. They’re diversifying and maximizing their revenue stream – this is just a bigger deal.

      1. And one of the reasons I don’t watch Jeopardy anymore.

        However, I’m interested in Ken Jennings getting his butt kicked, so I’ll watch.

  7. I like the idea of inserting a delay before pressing the button, but instead of it being random, I think it would be fairer if the delay were based on the time Watson takes to reach a certain confidence level with an answer. For example, if Watson takes 50ms to reach a 75% (or whatever) confidence level, then it should delay pressing the button by, say, 250ms or so, whereas if it formulates the same confidence in an answer in 10ms, then it only needs to delay 50ms before hitting the button. I think this is closer to how humans decide when to hit the button, anyway, based on the risk involved in not answering vs answering incorrectly.

    Even with such a mod, however, I fear Watson would still kick its human competitors’ asses.

  8. Fun fact: In the book HAL 9000 is first turned on the 12th of January 1997. That was the very day I received my first computer (a screaming P200). So of course I named him HAL and made a great custom theme.

    Error tone – “I’m sorry, Dave, I’m afraid I can’t do that.”

    Alert Tone – “I’ve detected an error in the AE35 Unit.”

    While looking through the programs via the start button – “Just a moment.”

    When closing down – “I’m dying Dave. I can feel it. I can feel it.”

    I had several others – great fun.

  9. I think the suggestions that humans do NOT buzz prematurely while trying to give themselves more time to figure out the answer, and/or humans do not give partial answers with the intent of eliciting a “be more specific” prompt in order to permit further answer processing time are:
    a) naive
    b) not familiar with watching Jeopardy
    c) didn’t RTFA very well, to gather Bob’s suggestion that the best Jeopardy winners are the fastest buzzers, not the smartest people.

    1. re: “Bob’s suggestion that the best Jeopardy winners are the fastest buzzers, not the smartest people.”

      My experience in Quiz Bowl and the like says this is true.

  10. You guys need to watch the 15 minute IBM-sponsored video about watson on youtube that explains most of what you’re asking, in terms of “how it thinks.”

    As for the contest itself, I don’t know the outcome, but I can tell you this:

    1) The people that do know the outcome had to sign non-disclosure agreements (obviously) that- and I have no idea how the hell IBM can get away with this, legally- forfeits your right to an attorney IF you were to spill the beans before the show airs. That’s right, somehow IBM has the legal testicles to claim you don’t even get a lawyer at your own trial.

    2) IBM is plunking an insane amount of money into this whole watson marketing campaign, for the obvious reason that they are hoping to get this “watson technology” into every damn aspect of human life as possible… and yes, it is as stupidly sci-fi as it sounds. Terminator, BSG, you name it- all those ridiculously scary premises were not enough to stop our friends, IBM, from pursuing their goal.

    Of course, what else do you expect from a company that custom-built all of the Nazi’s efficient counting machines? If you’re going to run a genocide, do it in style with IBM!

    1. If IBM’s business planning were as focused and savvy as das memsen imagines, we’d all be using IBM PCs running an IBM OS, and neither Apple or Microsoft would exist.

  11. Hmmm, here’s a place where humans MAY have an edge – wagering on the Daily Doubles (DDs). How have the IBM engineers programmed Watson to assess its own strength in the category, the dollar amount of the clue, the point in the game where the DDs are discovered, the scores of its opponents? All of these figure in to a good wager… or not.

    One thing I remember about Ken Jennings during his run of victories is that he was not afraid to bet (big) on himself and his knowledge when he uncovered a DD. This often led to KJ having an insurmountable lead going into Final Jeopardy. Many, many everyday contestants on Jeopardy! bet conservatively on the DDs and don’t take advantage of the extra time and their own inate knowledge. These qualities of self-knowledge, self-confidence, and calculations could confound Watson’s algorithims (if he/it gets the DD) or give advantage to Ken or Brad if they get it.

    Either way, I’ll bet Watson never throws up in the dressing room prior to the taping. On the other hand, what trivial anecdote is he going to share with Alex during the contestant interviews – how he met his fiance? What quirky name he gave to his cat? That his distant relative was a doctor… yes, THAT Doctor Watson!

  12. The trouble with speed tests is: most of the interesting questions can’t be answered speedily.

    Trivia is a lot of fun, but it’s about recall, not problem-solving. So the really interesting question about Watson is: does it scale?

  13. A question: are the electrical impulses of the machine faster than the the electrical impulses of the synapses of the brain?

  14. When IBM still had its gallery on Park Avenue, they had a 20 Questions exhibit where you could play a limited 20 questions game against one of their computers, probably a 360. There were only 20 possible answers. You got to pick one. Then, the computer asked you questions and figured out which answer you had chosen. It was a pretty impressive demo around 1970. For all I know, it might still make the basis of neat smartphone app.

  15. Watson will only buzz in when it is done thinking. It does not use extra time to think, like a human does. This was done on purpose. Otherwise Watson could just buzz in over and again, and win the reflex contest every time.

    From the NYTimes:

    Yet the truth is, in more than 20 games I witnessed between Watson and former “Jeopardy!” players, humans frequently beat Watson to the buzzer. Their advantage lay in the way the game is set up. On “Jeopardy!” when a new clue is given, it pops up on screen visible to all. (Watson gets the text electronically at the same moment.) But contestants are not allowed to hit the buzzer until the host is finished reading the question aloud; on average, it takes the host about six or seven seconds to read the clue.

    Players use this precious interval to figure out whether or not they have enough confidence in their answers to hazard hitting the buzzer. After all, buzzing carries a risk: someone who wins the buzz on a $1,000 question but answers it incorrectly loses $1,000.

    Often those six or seven seconds weren’t enough time for Watson. The humans reacted more quickly.

  16. I think anyone who says that the difference between Watson and the humans will come down to buzzer speed doesn’t know squat about AI. This is a hugely difficult task for a computer, even one as massive as Watson appears to be. I’m sure that no matter what the result, academics the world over will be studying the innovations that the IBM lab came up with to even dream of competing. If you asked researchers if a computer compete in Jeopardy five years ago, you’d be laughed at in any university in the world. Hell, I think it’s amazing if Watson is able to compete with untrained, unskilled humans – that’s some sophisticated language processing going on there!

    Maybe at one point the difference will be buzzer speed but today this challenge is pushing a lot of boundaries in computing abilities.

  17. I had the exact same reaction about the buzzer: it seems like the computer has a great advantage there. If it’s just responding to the light, then it’s a huge advantage.

    I noticed several times Brad and Ken clicking on their buzzers but Watson had already beaten them. We even commented that Ken got a taste of his own medicine — during his run he was sometimes unbeatable on buzzer timing.

    It’s a difficult thing to emulate in a game like this, the buzzer, because it’s so tied to human reaction time. Do you introduce a delay in Watson to make it possible to humans to win? Is that even fair to Watson? But not delaying the buzzer basically means Watson will win nearly every buzzer contest on questions where he is confident of an answer, a HUGE advantage. It’s a tough call.

    But I guess the point is, Watson is able to beat the other two precisely because he retrieves his answer so quickly and his confidence algorithms are solid enough that he knows when he’s right (often in the high 90s when he wins the buzzer). Regardless of the problems with human-computer buzzer interactions, that’s impressive. It’s less about who wins (Watson almost certainly will) but how well Watson does in such a short amount of time (and conversely, how badly he is wrong sometimes, like the Toronto answer on final Jeopardy, showing that the level of *understanding* in the computer is still no match for a human)

  18. Wake up people! Not you… I mean humankind! Immortal, but dead brains are draining our minds of wisdom and experience of all humanity. Don’t let them do that. Singularitarians stand behind it. We have to wake up and start to resist… the game is over…

  19. But it is still not thinking. It is an amazing achievement to be sure. It is still just a really sophisticated way of searching for the questions. Would Watson end up with the same question if the words from a given answer were moved around? Just wondering.

  20. “If IBM’s business planning were as focused and savvy as das memsen imagines, we’d all be using IBM PCs running an IBM OS, and neither Apple or Microsoft would exist.”

    Such a naive perspective of the computing world. Just because you own a iPad that syncs iTunes on your Windows 7 laptop while your kid plays on the XBox360 doesn’t mean anything where it counts – revenue. IBM has quietly been making high end business machines that run real businesses. The numbers don’t lie.

    Gross Revenue: Apple…….. 2009 $42.9… 2010 $63.4 Billion
    Gross Revenue: Microsoft.. 2009 $58.6… 2010 $62.4 Billion
    Gross Revenue: IBM………. 2009 $95.8… 2010 $105.9 Billion

    Apple + Microsoft approximately equals IBM. I wonder why IBM doesn’t make PC’s any more? IBM invests in research that changes the future of computing, not the past. Its a better world with all three companies doing their thing. Get a clue.

  21. I was wondering how Watson was able to beat the others to the buzzer. Obviously, it’s much faster at responding to the light than humans like me are.

    In my time on the show, I never used it.

    When I tried responding to the light in rehearsal, I found myself buzzing in too late almost every time. In actual competition, I went with hitting the buzzer as Trebek’s last syllable was hanging in the air. This method got me to the the final round of the Tournament–where my timing went cold in the second show, costing me any chance of winning the big prize. You’re absolutely correct about the players knowing nearly all the answers; it was hugely frustrating to be confident of getting the clue only to be beaten to the buzzer. Many of the other contestants I talked to said the same thing. On the other hand, I never expected to win anything, so the whole experience was a bonus.

    –Isaac Segal (Season 11)

Comments are closed.