Watson for Oncology isn't an AI that fights cancer, it's an unproven mechanical turk that represents the guesses of a small group of doctors

There are 50 hospitals on 5 continents that use Watson for Oncology, an IBM product that charges doctors to ingest their cancer patients' records and then make treatment recommendations and suggest journal articles for further reading.

The doctors who use the service assume that it's a data-driven AI that's using data from participating hospitals to create massive data-sets of cancer treatments and outcomes and refine its inferences. That's how IBM advertises it. But that's not how it works.

In reality, Watson for Oncology is a "mechanical turk" — a human-driven engine masquerading as an artificial intelligence. The way it actually works is by convening a small panel of cancer experts from Memorial Sloan Kettering Hospital, who come up with recommendations for specific patient profiles. These recommendations represent the best guesses of these experts, supported by medical literature and personal experience.

IBM has never allowed an independent study of Watson for Oncology. No followup is done to evaluate whether its recommendations help patients.

There are several problems with this approach. First, there is the deceptive marketing of Watson for Oncology to doctors and patients, who believe they are getting a global, data-driven, empirical recommendation, as opposed to the subjective judgment of a small panel of experts.

Then there's the problem with the Sloan-Kettering doctors' experience. The doctors work at one of America's top hospitals, which means that they see the kinds of Americans who can afford the most expensive treatments. These Americans' life circumstances, histories, and treatment experiences are massively atypical of many of the people whom Watson for Oncology will recommend treatments (it's a microcosm of the WEIRD problem in psych research).

Further, Watson for Oncology really seems to struggle with ingesting patient records. This is really key: if the Sloan Kettering doctors' recommendations have any validity, they are absolutely dependent on being correctly matched with patients diagnoses and facts. Behind the scenes, IBM has to pay humans to review Watson's matches between diagnoses and the Sloan Kettering profiles.

Finally, there's the lack of independent scrutiny and feedback. In her seminal 2016 book Weapons of Math Destruction, Cathy O'Neil describes the most urgent red flags for automated systems that can go terribly wrong. One of the most important is the lack of a feedback loop. When Amazon uses machine learning to change its page layouts, it measures sales before and after the intervention, to see if it works. Watson doesn't do this: it blithely makes treatment recommendations that could kill people, and no one ever checks to see whether they're any good.

"IBM ought to quit trying to cure cancer," said Peter Greulich, a former IBM brand manager who has written several books about IBM's history and modern challenges. "They turned the marketing engine loose without controlling how to build and construct a product."

Greulich said IBM needs to invest more money in Watson and hire more people to make it successful. In the 1960s, he said, IBM spent about 11.5 times its annual earnings to develop its mainframe computer, a line of business that still accounts for much of its profitability today.

If it were to make an equivalent investment in Watson, it would need to spend $137 billion. "The only thing it's spent that much money on is stock buybacks," Greulich said.

IBM said it created the market for artificial intelligence and is pleased with the pace of Watson's growth, noting that it and other new business units grew by more than $20 billion in the past three years. "It took Facebook and Amazon more than 13 years to grow $20 billion," the company said in a statement.

IBM pitched its Watson supercomputer as a revolution in cancer care. It's nowhere close
[Casey Ross @caseymross and Ike Swetlitz/Statnews]