Michael I Jordan is an extremely accomplished computer scientist who is also deeply skeptical of claims made by Big Data advocates as well as people who believe that machine intelligence, AI and machine vision are solved, or nearly so.
Read the rest
If you send your holiday photos to Google's Autoawesome processor, it will snip out the best smiles and poses and combine them to make pictures of scenes that never actually happened.
Read the rest
To inaugurate the publication of his brilliant new book How We Got to Now: Six Innovations That Made the Modern World (also a PBS/BBC TV series), Steven Johnson has written about the difficult balance between reporting on the history of world-changing ideas and the inventors credited with their creation
Read the rest
Shardcore, who gave us the programatically generated Hipsterbait tees, had advanced the art of autonomous, self-perpetuating Internet memes, with @factbot1, a bot that creates true-sounding, viral-ish lies ("Indonesians always turn left when exiting a cave", "In just one drop of Sesame seeds, 50 million bacteria can be present", "Morels were used as a Sesame seeds substitute during the Norwegian Civil War"). Here's an essay that explains the project:
Read the rest
Eugene Goostman, a program simulating a 13-year-old Ukrainian boy, has attained a 33% success rate at the annual RSA Turing Test competition, meaning that a third of the judges were fooled into thinking that the chatbot was actually a human being. Alan Turing's iconic test was meant to cut through the existentialist crisis in artificial intelligence about what was or wasn't "intelligence" by proposing that if a human being could not distinguish between a person and code in a blind test, the code was intelligent by human standards.
The Goostman bot enjoyed the advantage of simulating someone whose first language wasn't English, and whose apparent young age could explain a lack of nuanced reasoning and basic knowledge, so you could think of this as kind of a cheat, but it's still a very impressive feat.
Read the rest
Expiration Day is William Campbell Powell's debut YA novel, and it's an exciting start. The novel is set in a world in which human fertility has collapsed, taking the birth-rate virtually to zero, sparking riots and even a limited nuclear war as the human race realizes that it may be in its last days. Order is restored, but at the price of basic civil liberties. There's a little bit of Orwell (a heavily surveilled and censored Internet); but mostly, it's all about the Huxley. The major locus of control is a line of robotic children -- all but indistinguishable from flesh-and-bloods, even to themselves -- who are sold to desperate couples as surrogates for the children they can't have, calming the existential panic and creating a surface veneer of normalcy.
Expiration Day takes the form of a private diary of Tania, an 11 year old vicar's daughter in a small village outside of London. Tania's father's parishioners have found religion, searching for meaning in their dying world. He is counsellor and father-figure to them, though the family is still relatively poor. Tania is a young girl growing up in the midst of a new, catastrophic normal, the only normal she's ever known, and she's happy enough in it. But them she discovers that she, too, is a robot, and has to come to grips with the fact that her "parents" have been lying to her all her life. What's more, the fact that she's a robot means that she won't live past 18: all robots are property of a private corporation, and are merely leased to their "parents," and are recalled around their 18th birthday, turned into scrap.
Read the rest
Dave from the Electronic Frontier Foundation writes, "Seven years ago, the U.S. Army launched the SGT STAR program, which uses a virtual recruiter (an AI chatbot) to talk to potential soldiers. We put in a FOIA request for a bunch of documents related to the program, including current and historical input/output scripts. So far, the Army Research and Marketing Group--which is supposed to help with transparency--hasn't responded."
Read the rest
It's watching us, and this is what it sees. Mike Pelletier explores quantified emotions in software, in collaboration with Subbacultcha! and Pllant / Marieke van Helden [Video Link]
JanusNode sez, "Janusnode is 'a user-configurable dynamic textual projective surface,' AKA a programmable text generating application. It has released a book entitled 'You can bring an elephant to a Broadway show, but you cannot make it drink Chablis: 365 computer-generated excuses to converse', self-published (copyright-free) through Lulu.com. As the title suggests, the book consists of 365 automatically-generated (but human-curated) topics for discussion, ranging from the bizarre to the profound. The rule set for generating the discussion topics ships with JanusNode (among many other rule sets), which is free from JanusNode.com, so you can also generate and choose your own discussion topics if you don't want to spring for the printed pre-curated set."
You can bring an elephant to a Broadway show, but you cannot make it drink Chablis: 365 computer-generated excuses to converse
A computer scientist and a psychology professor analyze Entropica
— the artificial intelligence system that's been getting major buzz in the blogosphere. Quick version: It's a good idea, but it underestimates the complexity of the real world. Sure, you could create an AI that can play chess, but that same bot won't necessarily have the skills it needs to also be capable of understanding grammar and sentence structure.
Charlie Warzel: "THIS is what google's self driving car can see. So basically this thing is going to destroy us all." [via Matt Buchanan]
A "Snowball" is a poem "in which each line is a single word, and each successive word is one letter longer." Nossidge built an automated Snowball generator that uses Markov Chains, pulling text from Project Gutenberg. It's written in C++, with code on GitHub. The results are rather beautiful poems (these ones are "mostly Dickens"):
Snowball (also called a Chaterism)
Farhad Manjoo: "Google has a single towering obsession: It wants to build the Star Trek computer
In Wired, Steven Levy has a long profile of the fascinating field of algorithmic news-story generation. Levy focuses on Narrative Science, and its competitor Automated Insights, and discusses how the companies can turn "data rich" streams into credible news-stories whose style can be presented as anything from sarcastic blogger to dry market analyst. Narrative Science's cofounder, Kristian Hammond, claims that 90 percent of all news will soon be algorithmically generated, but that this won't be due to computers stealing journalists' jobs -- rather, it will be because automation will enable the creation of whole classes of news stories that don't exist today, such as detailed, breezy accounts of every little league game in the country.
Narrative Science’s writing engine requires several steps. First, it must amass high-quality data. That’s why finance and sports are such natural subjects: Both involve the fluctuations of numbers—earnings per share, stock swings, ERAs, RBI. And stats geeks are always creating new data that can enrich a story. Baseball fans, for instance, have created models that calculate the odds of a team’s victory in every situation as the game progresses. So if something happens during one at-bat that suddenly changes the odds of victory from say, 40 percent to 60 percent, the algorithm can be programmed to highlight that pivotal play as the most dramatic moment of the game thus far. Then the algorithms must fit that data into some broader understanding of the subject matter. (For instance, they must know that the team with the highest number of “runs” is declared the winner of a baseball game.) So Narrative Science’s engineers program a set of rules that govern each subject, be it corporate earnings or a sporting event. But how to turn that analysis into prose? The company has hired a team of “meta-writers,” trained journalists who have built a set of templates. They work with the engineers to coach the computers to identify various “angles” from the data. Who won the game? Was it a come-from-behind victory or a blowout? Did one player have a fantastic day at the plate? The algorithm considers context and information from other databases as well: Did a losing streak end?
Then comes the structure. Most news stories, particularly about subjects like sports or finance, hew to a pretty predictable formula, and so it’s a relatively simple matter for the meta-writers to create a framework for the articles. To construct sentences, the algorithms use vocabulary compiled by the meta-writers. (For baseball, the meta-writers seem to have relied heavily on famed early-20th-century sports columnist Ring Lardner. People are always whacking home runs, swiping bags, tallying runs, and stepping up to the dish.) The company calls its finished product “the narrative.”
Both companies claim that they'll be able to make sense of less-quantifiable subjects in the future, and will be able to generate stories about them, too.
Can an Algorithm Write a Better News Story Than a Human Reporter?
Jeremy Kun, a mathematics PhD student at the University of Illinois in Chicago, has posted a wonderful primer on probability theory for programmers on his blog. It's a subject vital to machine learning and data-mining, and it's at the heart of much of the stuff going on with Big Data. His primer is lucid and easy to follow, even for math ignoramuses like me.
For instance, suppose our probability space is and is defined by setting for all (here the “experiment” is rolling a single die). Then we are likely interested in more exquisite kinds of outcomes; instead of asking the probability that the outcome is 4, we might ask what is the probability that the outcome is even? This event would be the subset , and if any of these are the outcome of the experiment, the event is said to occur. In this case we would expect the probability of the die roll being even to be 1/2 (but we have not yet formalized why this is the case).
As a quick exercise, the reader should formulate a two-dice experiment in terms of sets. What would the probability space consist of as a set? What would the probability mass function look like? What are some interesting events one might consider (if playing a game of craps)?
Probability Theory — A Primer
(Image: Dice, a Creative Commons Attribution (2.0) image from artbystevejohnson's photostream)