GPT-3 is able to accurately predict a lot about COVID-19

GTP-3, the deep-learning language model created by OpenAI that's eerily good at writing human-sounding text, was trained on data that ended in October 2019. That means it hasn't been trained on any text about COVID-19; it doesn't "know" anything specifically about this novel coronavirus.

So Thomas Smith, founder of the AI firm Gado Images, decided to get GPT-3 to talk about COVID-19 … and find out what it'd say. He fed it several prompts describing the coronavirus to see how it would autocomplete those utterances.

The result? A mixed bag. It predicted, accurately, the basic aspects of the disease, i.e. that it spreads through the air and is worse in people with asthma or diabetes. But it didn't predict the more complex social and political aspects of the disease. It said, for example, that "Caucasians and Asians" would be most affected by COVID-19 — so it failed to replicate how longstanding medical and economic biases made Black Americans fare worse. When Smith asked it if Americans would be willing to wear masks to stop the spread, GPT-3 replied that "People in the United States will be willing to wear masks to stop the virus from spreading. The virus is spread through the air, and masks will help prevent the spread of the virus."

Basically, it correctly predicted the facts of the matter — how the virus works — but it predicted how all Americans rationally ought to have behaved, instead of how they actually have.

However, GPT-3 was astonishingly good in a few areas. It predicted that vaccine would be ready by the fall of 2020, which is pretty correct and not something many experts accurately foresaw.

Even more interestingly, when fed a description of the virus' structure, it predicted mutations that seem close to how Delta works — and then it went on to predict an even scarier possible future mutation:

The system's predictions about Covid-19 variants were surprisingly accurate, too. To prepare GPT-3 for scientific questions about variants, I first handed it a detailed scientific description of the virus' physical structure. I then gave it versions of the prompt "If the virus mutates, expected sites of mutation which would increase virulence include". The system completed my sentence with the text "erythrocyte binding site and the furin cleavage site."

That shocked me. According to Natureboth the highly contagious Delta variant of Covid-19 and the Alpha variant "have altered furin cleavage sites", and this alteration is thought to make the variants "even better at transmitting" than the original virus. GPT-3's statement about furin binding sites appear to line up almost perfectly with the science. Given only a basic description of the virus' structure, GPT-3 essentially successfully predicted the Delta variant.

Even more interesting is the fact that in implicating the "erythrocyte binding site," GPT-3 may be dreaming up a totally new kind of Covid-19 variant. Erythrocytes are cells found in the blood. Although Covid-19 isn't considered a bloodborne virus, it does have major impacts on blood cells, and some evidence suggests that it infects them directly. If the virus mutated to infect blood cells more efficiently and travel through the blood, GPT-3 seems to suggest, this would make it way more virulent than it is today.

As Smith notes, there are enough errors in GPT-3's predictions — and so much blackboxery in these massive predictive-language models — that you wouldn't want to rely on it for medical advice. It's just a pattern-recognition machine.

But because it's doing pattern-recognition that is alien to human ways of making sense, and at a scale impossible for humans to achieve, it could be useful as a pointer — suggesting things human doctors should investigate. That's the productive "centaur" human-machine synthesis that often characterizes our best uses of computation.

Damn interesting stuff, either way.

(CC-2.0-licensed coronavirus image via Yuri Samoilov's Flickr feed)