— FEATURED —
— FOLLOW US —
— POLICIES —
Except where indicated, Boing Boing is licensed under a Creative Commons License permitting non-commercial sharing with attribution
— FONTS —
Charlie Warzel: "THIS is what google's self driving car can see. So basically this thing is going to destroy us all." [via Matt Buchanan]
A "Snowball" is a poem "in which each line is a single word, and each successive word is one letter longer." Nossidge built an automated Snowball generator that uses Markov Chains, pulling text from Project Gutenberg. It's written in C++, with code on GitHub. The results are rather beautiful poems (these ones are "mostly Dickens"):
In Wired, Steven Levy has a long profile of the fascinating field of algorithmic news-story generation. Levy focuses on Narrative Science, and its competitor Automated Insights, and discusses how the companies can turn "data rich" streams into credible news-stories whose style can be presented as anything from sarcastic blogger to dry market analyst. Narrative Science's cofounder, Kristian Hammond, claims that 90 percent of all news will soon be algorithmically generated, but that this won't be due to computers stealing journalists' jobs -- rather, it will be because automation will enable the creation of whole classes of news stories that don't exist today, such as detailed, breezy accounts of every little league game in the country.
Narrative Science’s writing engine requires several steps. First, it must amass high-quality data. That’s why finance and sports are such natural subjects: Both involve the fluctuations of numbers—earnings per share, stock swings, ERAs, RBI. And stats geeks are always creating new data that can enrich a story. Baseball fans, for instance, have created models that calculate the odds of a team’s victory in every situation as the game progresses. So if something happens during one at-bat that suddenly changes the odds of victory from say, 40 percent to 60 percent, the algorithm can be programmed to highlight that pivotal play as the most dramatic moment of the game thus far. Then the algorithms must fit that data into some broader understanding of the subject matter. (For instance, they must know that the team with the highest number of “runs” is declared the winner of a baseball game.) So Narrative Science’s engineers program a set of rules that govern each subject, be it corporate earnings or a sporting event. But how to turn that analysis into prose? The company has hired a team of “meta-writers,” trained journalists who have built a set of templates. They work with the engineers to coach the computers to identify various “angles” from the data. Who won the game? Was it a come-from-behind victory or a blowout? Did one player have a fantastic day at the plate? The algorithm considers context and information from other databases as well: Did a losing streak end?
Then comes the structure. Most news stories, particularly about subjects like sports or finance, hew to a pretty predictable formula, and so it’s a relatively simple matter for the meta-writers to create a framework for the articles. To construct sentences, the algorithms use vocabulary compiled by the meta-writers. (For baseball, the meta-writers seem to have relied heavily on famed early-20th-century sports columnist Ring Lardner. People are always whacking home runs, swiping bags, tallying runs, and stepping up to the dish.) The company calls its finished product “the narrative.”
Both companies claim that they'll be able to make sense of less-quantifiable subjects in the future, and will be able to generate stories about them, too.
Jeremy Kun, a mathematics PhD student at the University of Illinois in Chicago, has posted a wonderful primer on probability theory for programmers on his blog. It's a subject vital to machine learning and data-mining, and it's at the heart of much of the stuff going on with Big Data. His primer is lucid and easy to follow, even for math ignoramuses like me.
For instance, suppose our probability space is and is defined by setting for all (here the “experiment” is rolling a single die). Then we are likely interested in more exquisite kinds of outcomes; instead of asking the probability that the outcome is 4, we might ask what is the probability that the outcome is even? This event would be the subset , and if any of these are the outcome of the experiment, the event is said to occur. In this case we would expect the probability of the die roll being even to be 1/2 (but we have not yet formalized why this is the case).
As a quick exercise, the reader should formulate a two-dice experiment in terms of sets. What would the probability space consist of as a set? What would the probability mass function look like? What are some interesting events one might consider (if playing a game of craps)?
Last month, I blogged about Relatively Prime, a beautifully produced, crowdfunded free series of math podcasts. I just listened to the episode on Chinook (MP3), the program that became the world champion of checkers.
Chinook's story is a bittersweet and moving tale, a modern account of John Henry and the steam-drill, though this version is told from the point of view of the machine and its maker, Jonathan Schaeffer, a University of Alberta scientist who led the Chinook team. Schaeffer's quest begins with an obsessive drive to beat reigning checkers champ Marion Tinsley, but as the tale unfolds, Tinsley becomes more and more sympathetic, so that by the end, I was rooting for the human.
This is one of the best technical documentaries I've heard, and I heartily recommend it to you.
Kara, a disturbing short film about a self-aware robot, was made by games studio Quantic Dream to demonstrate the "expressive power" of the PS3's graphics. In order to sidestep the limitations of animating human characters (the so-called, contentious "uncanny valley"), the creators made a story about a newborn, intelligent robot -- a character that is supposed to be subtly unconvincing in its humanity.
"Our goal at the time with The Casting was to use the game engine to see how we could convey different emotions," Cage tells us prior to the GDC talk where he's unveiling a slice of what Quantic Dream has been up to since 2010. "We wanted to see what it would take in terms of the technology but also with the acting, and working with the actor on-stage to have this performance coming across in the game engine. We learned so much doing it for Heavy Rain, from the good things that worked very well but also from the mistakes that we made, and things we could have done differently. 'Introducing Quantic Dream's Kara' Screenshot 1
"When Heavy Rain was over, we thought why not do exactly the same thing and do a short sequence in real-time, in the game engine to see how our next game is going to benefit from what we're going to learn?"
"In Kara, you can't imagine the same scene having the same impact as someone who's not a talented actor. Technology becomes more precise and detailed and gives you more subtleties, so you need talent now. I'm not talking about getting a name in your game - I'm talking about getting talent in your game to improve the experience and get emotion in your game."
Welcome to Kara, the product of Quantic Dream's recent work on the PlayStation 3, and of its investment in new motion capture facilities. Again it's a one-woman show built around a slow tonal shift, again channelled through a strong and actorly central performance - but the distance between Kara and The Casting is as good a measure as any of the technical progress we've seen this generation, and of a shift in ambition and capability within Quantic Dream.
As interesting as this is as a technology demo, I think its real value is in the questions raised by the story and the storytelling choices. The unsettling poignancy of this clip arises from the gender and form of the robot. It would be interesting to re-render this with the "robot" as a kind of arachnoid assembly-line robot with a gender-neutral voice and see what happens to the film's affect.
Carlos Bueno, author of a kids' book about understanding computers called Lauren Ipsum, describes what happens when the cadre of competing bots that infest Amazon's sales-database began to viciously fight with one another over pricing for his book. It's a damned weird story.
Before I talk about my own troubles, let me tell you about another book, “Computer Game Bot Turing Test”. It's one of over 100,000 “books” “written” by a Markov chain running over random Wikipedia articles, bundled up and sold online for a ridiculous price. The publisher, Betascript, is notorious for this kind of thing.
It gets better. There are whole species of other bots that infest the Amazon Marketplace, pretending to have used copies of books, fighting epic price wars no one ever sees. So with “Turing Test” we have a delightful futuristic absurdity: a computer program, pretending to be human, hawking a book about computers pretending to be human, while other computer programs pretend to have used copies of it. A book that was never actually written, much less printed and read.
The internet has everything.
This would just be an interesting anecdote, except that bot activity also seems to affect books that, you know, actually exist. Last year I published my children's book about computer science, Lauren Ipsum. I set a price of $14.95 for the paperback edition and sales have been pretty good. Then last week I noticed a marketplace bot offering to sell it for $55.63. “Silly bots”, I thought to myself, “must be a bug”. After all, it's print-on-demand, so where would you get a new copy to sell?
Then it occured to me that all they have to do is buy a copy from Amazon, if anyone is ever foolish enough to buy from them, and reap a profit. Lazy evaluation, made flesh. Clever bots!
Then another bot piled on, and then one based in the UK. They started competing with each other on price. Pretty soon they were offering my book below the retail price, and trying to make up the difference on "shipping and handling". I was getting a bit worried.
Sidebar: Lauren Ipsum sounds so interesting, I've just ordered a copy to read to my daughter!