AI learns to play Ms Pac Man

Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man, a new paper published in the Journal of Artificial Intelligence Research 30 details a very successful experiment in teaching an AI to play Ms Pac Man:
The researchers had agents play 50 games using different RL methods. They found that methods utilizing the cross-entropy policies performed better than methods that were hand-crafted. As they explained, the basic idea of cross-entropy is that it selects the most successful actions, and modifies the distribution of actions to become more peaked around these selected actions.

During the game, the AI agent must make decisions on which way to go, which are governed by ruled-based policies. When the agent has to make a decision, she checks her rule list, starting with the rules with highest priority. In Ms. Pac-Man, ghost avoidance has the highest priority because ghosts will eat her. The next rule say that if there is an edible ghost on the board, then the agent should chase it, because eating ghosts results in the highest points.

One rule that the researchers found to be surprisingly effective was the rule that the agent should not turn back, if all directions are equally good. This rule prevents Ms. Pac-Man from traveling over paths where the dots have already been eaten, resulting in no points.

Link (via /.)


  1. This is some very sophisticated AI stuff, I can’t make much sense of the .pdf article. If anyone can enlighten us on how this is a major breakthrough from traditional logic-based AI, please feel free.

  2. The article says they chose Ms.PacMan because the ghosts were random. Wrong.

    The ghosts have priority levels of their own. They’re seemingly random, but have an attack order. Its generally related to reducing either the vertical or horizontal distance and taking into account the direction of pac man, then coming to the decision based on a set number of pixels before an intersection on the maze (some decide sooner, some later). The ghosts will clue the user in based on the shifting of their eyes. There are lengthy FAQs on this.

    Anyways, it was a neat survey… and seemingly random… but they could have possibly reverse engineered the ghosts rules first, then plot out ways around that.

  3. How did Ms. Pac-man get to the position she is shown at in the illustration that accompanies this article?

  4. @#4 – I believe there was an Asimov (the writer, not the magazine) story that was based around the same idea; a robot that started as a proof-reader at a University, then eventually progressed to the point of writing the material it proofread. Ultimately the robot was sabotaged by a luddite professor, if I remember correctly.

    Life imitates art imitates life.

  5. The difference between RL and traditional AI is that in RL the agent is supposed to find the optimal strategy on its own (ie. without a programmer telling him to go up or down for specific situations).

    The agent recieves a penalty (or scores points) for every action it takes and by randomly trying out all possible action it gradually learns which actions optimise its winnings, until it hopefully ends up with an optimal solution.

    However, these guys used some kind of hybrid where the agent isn’t totally free to do whatever it wants, but it has to use a pre-programmed rulebase. I think they did this because the game can be in a lot of possible ‘states’ (with all the ghosts and dots and directions) so a true RL algorithm would need an infeasible amount of memory (and learning time).

  6. @#5 You’re right, this image looks impossible. How did she eat all the dot in the center?

    Maybe there were no dots in the center to begin with? If that’s the case it seems like the current position would violate the “Don’t turn back” rule. Maybe there’s a higher ranking rule on power pellets or she died and just restarted.

  7. Jonathan V., there are FAQs written about Ms. Pac-Man, but those FAQs also mention the fact that the first few seconds of each board, the red and pink monsters have a (probably pseudo-)random element in order to throw off the patterns that made high-level Pac-Man play a matter of memorization.

    Also of note: the article itself states that the AI could do better than an average player, but its average score over 50 games is 8186. This is still no great shakes. (And it leads me to wonder where they found their human players, who had average scores of less than this.)

  8. An AI beating an AI. Yay!

    By the way, boingboings own AI just told me this comment could not be posted because “the text entered was wrong”. Is there a captcha that should be displaying?

  9. ‘During the game, the AI agent must make decisions on which way to go, which are governed by ruled-based policies. When the agent has to make a decision, she checks her rule list, starting with the rules with highest priority.’

    How does a programmed set of prioritized rules amount to “AI learns to play ms pac-man” ? ? ?

    It looks more like a study in strategy and rule-ranking. The different definitions of AI: “imitation of intelligence” and “manufactured intelligence” seem confused in this post. The article itself seems confused about whether the programmers “taught” Ms. Pac-Man or if Ms. Pac-man taught herself.

    “AI agents who learned with the most successful policy […]”. The terms ‘learned with’ here seems to simply mean “were given” yet the article notes that “[ghost-luring] strategy didn’t evolve in the AI experiments. “

  10. “The article says they chose Ms.PacMan because the ghosts were random. Wrong.”

    What they mean is that the ghosts are non-deterministic (they may make different decisions given the exact same situation). Maybe just calling their behaviours “random” is a little simplistic, but they have some randomness (word?) incorporated into their decision making process, in contrast with say, the original pac-man, where their actions are deterministic (even, when blue which is only a pseudo random behaviour, and due to seeding becomes deterministic) – hence patterns working for that game. Hrmm.. that sentence was too long, oh well.

Comments are closed.