Algorithmically constructed news

In Wired, Steven Levy has a long profile of the fascinating field of algorithmic news-story generation. Levy focuses on Narrative Science, and its competitor Automated Insights, and discusses how the companies can turn "data rich" streams into credible news-stories whose style can be presented as anything from sarcastic blogger to dry market analyst. Narrative Science's cofounder, Kristian Hammond, claims that 90 percent of all news will soon be algorithmically generated, but that this won't be due to computers stealing journalists' jobs -- rather, it will be because automation will enable the creation of whole classes of news stories that don't exist today, such as detailed, breezy accounts of every little league game in the country.

Narrative Science’s writing engine requires several steps. First, it must amass high-quality data. That’s why finance and sports are such natural subjects: Both involve the fluctuations of numbers—earnings per share, stock swings, ERAs, RBI. And stats geeks are always creating new data that can enrich a story. Baseball fans, for instance, have created models that calculate the odds of a team’s victory in every situation as the game progresses. So if something happens during one at-bat that suddenly changes the odds of victory from say, 40 percent to 60 percent, the algorithm can be programmed to highlight that pivotal play as the most dramatic moment of the game thus far. Then the algorithms must fit that data into some broader understanding of the subject matter. (For instance, they must know that the team with the highest number of “runs” is declared the winner of a baseball game.) So Narrative Science’s engineers program a set of rules that govern each subject, be it corporate earnings or a sporting event. But how to turn that analysis into prose? The company has hired a team of “meta-writers,” trained journalists who have built a set of templates. They work with the engineers to coach the computers to identify various “angles” from the data. Who won the game? Was it a come-from-behind victory or a blowout? Did one player have a fantastic day at the plate? The algorithm considers context and information from other databases as well: Did a losing streak end?

Then comes the structure. Most news stories, particularly about subjects like sports or finance, hew to a pretty predictable formula, and so it’s a relatively simple matter for the meta-writers to create a framework for the articles. To construct sentences, the algorithms use vocabulary compiled by the meta-writers. (For baseball, the meta-writers seem to have relied heavily on famed early-20th-century sports columnist Ring Lardner. People are always whacking home runs, swiping bags, tallying runs, and stepping up to the dish.) The company calls its finished product “the narrative.”

Both companies claim that they'll be able to make sense of less-quantifiable subjects in the future, and will be able to generate stories about them, too.

Can an Algorithm Write a Better News Story Than a Human Reporter?



  1. “Narrative Science’s cofounder, Kristian Hammond, claims that 90 percent of all news will soon be algorithmically generated, but that this won’t be due to computers stealing journalists’ jobs — rather, it will be because automation will enable the creation of whole classes of news stories that don’t exist today, such as detailed, breezy accounts of every little league game in the country.”

    I’m sorry, but that’s a contradicton imho.  Part of the reason little league doesn’t get coverage now is because nobody gets paid to cover sports anymore.

    Knowing a little bit about the newspaper business myself, if they manage to build an algorithmic process that can make sense of stories and spit out something that an editor can work with, and they set a price that’s competitive with reporters’ wages, this will absolutely put journalists out of work, and will likely be used to keep newspapers wheezing along for a few more years.

    Maybe the STEM snobs are right; maybe it’s necessary for everyone to get a STEM degree.  Because they’ll be the only people with actual useful job skills.  The rest of us will be, I don’t know, hell, I suppose at some point there will be a tipping point where we won’t even need STEM because society will go off the rails.

    1. Your contradiction doesn’t make sense: if it’s already uneconomic to pay journalists to write about little league then this development doesn’t change anything for journalists.

      1. If you knew how poorly managed modern mass media tends to be, you’d know that’s a mighty big “if”.

        Also…I was at one of the larger publishing companies in the U.S., and one of the side effects of being a public company is that, even if this cranks out terrible dreck and makes subscribers leave in droves, but if it boosts that quarter’s profits, it will be implemented. Customer satisfaction and long-term goals are for sissies.

        1. It’s your assertion that little league doesn’t have journalists covering it. If you think there is an economic case to be made for sending a journalist to every game, by all means set up and make a fortune.

      2. Right, but where’s your data feed for little league games?  Do little league umpires’ calls get fed into some RSS-like feed somewhere?

    2. Take a few data feeds and apply Markov chains to them and you’d be surprised how convincingly human (and original) they seem.

      Today’s “news” stories are less about writing for the reader and more about writing for the search engines (read: Google). News is ever more market-driven: if people tend to search for and click on fake Little League stories, the algorithms will simply churn out more of the same.

  2. We were supposed to all be sharing the wealth generated by technological civilization but in fact all that is happening is an amplification of the old style of society that was conditioned by scarcity.  Now of course a lot of the scarcity is simply artificial, the result of those with the power holding on to the goods and good old fashioned stupidity.  

  3. Computers won’t be stealing journalists’ jobs?  I’m sure I trust that analysis.  Why, with computers handling the day-to-day tasks, this will free up real journalists to do real journalism.

    And they totally won’t all get laid off, with the wealth flowing straight to the 1%.  Because this time, it will be different.

    1. Former wage slave at a small newspaper at a newspaper company.  Some of the gushing praise is telling.  Oh, this will mean that things like high school sports will get coverage now!  Yeah…and they used to.  By reporters.  And those people would talk to coaches, players, parents, and so on.  Proud parents and family would eat it up.  Obsessive older folks would eat it up.

      At one particular property I can think of, this will put people out of work who barely make ends meet, who run out to the events as stringers, as a second or third job, getting paid by the inch.  Those guys will be gone, because there’s not a lot of variation there.

      Meanwhile, at this particular company, the company has debt due that numbers in the billions, acquired through a series of mergers and acquisitions (after one such round, they laid off literally hundreds of sports reporters) and their CEO gets an annual bonus which would pay for brand-new iMacs for every employee.  But, oh, they’re hurting. They can’t afford such things.  They’re public, they have to pay for executive talent.  He can’t give it back, he earned it.  It’s the low-level people who must sacrifice.  My former job was one of the higher paying jobs but is still below poverty level (my wife made a lot more than I did) but it was too expensive.  Now they pay an Indian person even less money to do an inferior job, and they wring their hands about how people just don’t want newspapers anymore and how they’ll have to lay more people off.

      My guess is that that company will probably buy into this.  They’ll lay even more people off, ditch the plan from years ago to get community members to participate in a “community blog”, and have an algorithm write the content and keep a couple of salespeople with laptops so they can communicate with out-of-state and out-of-country production people to take money out of the community.  And a hearty round of bonuses for management!

      What grand times we live in!

  4.   Almost all creative work is rule-based and learned rather than “inspired”, so there’s no reason why these “jobs” can’t now be automated.

      Where do you think the unending stream of romance novels comes from? Today meat robots, tomorrow virtual ones…  IBM’s Watson is about to dominate medical practice after winning Jeopardy –

      We really need to redefine our own relevance.  The future is here – “Duck ninnies!”

    1. “Where do you think the unending stream of romance novels comes from?”

      I remember a short story about a person who wrote an A.I. that cranked out great works of literature.  I have beside me a book which I though contained it, The Mammoth Book of Extreme Science Fiction (Flowers from Alice is in the collection) but I don’t see it.  

      1. I believe you’re thinking of “The Great Automatic Grammatizator” by Roald Dahl. It’s a very funny, and, now, very disturbingly prescient story. The device is rapidly adopted by hack authors who sign a contract in exchange for being paid for the use of their names. In the context of the above its last lines are particularly chilling:

        And all the time things get worse for those who hesitate to sign their names.
        This very moment, as I sit here listening to the crying of my nine starving children in the other room, I can feel my own hand creeping closer and closer to that golden contract that lies over on the other side of the desk.
        Give us strength, Oh Lord, to let our children starve.

    2. Allow me to invite an unpicking of the metaphors in that series of assertions. And an unpacking of the implications of both the term “inspired” and its placement in scare quotes. Just to get things started: what rules, learned in what manner and over what period, and embedded in what computational substrate? I leave the working out as an exercise for the student.

  5. Unless these guys are a lot better than previous algorithmic text experiments, I would be inclined to use the term ‘ink slime’ to describe the resulting somewhat-news-like(but cheap!) slurry.

  6. The Gawker blogger of the future with be a bemused, snarky, and detached computer algorithm forced to defend yet another of Dentonbot’s terrible redesigns. The algorithm will work hard for a year, and then just phone it in with condescending posts about exercise and snide remarks about the those clickbait algorithms working for Huffingtonbot.

  7. This is something we were already headed for in 1960, when Victor Yngve published algorithms for using phrase-structured grammars to generate syntactically-correct random sentences.  A few years later Sheldon Klein extended this idea to complete narratives, and with his students was generating everything from coherent mystery stories to folk tales (in FORTRAN! using punchcards!).  Klein was convinced that the the explosion in the 70s and 80s in automated text analysis and generation was due to funding from organizations whose primary interest was automated wiretapping.

Comments are closed.