Google's talking AI is indistinguishable from humans

Tacotron 2 is Google's new text-to-speech system, and as heard in the samples below, it sounds indistinguishable from humans.

From Quartz:

The system is Google’s second official generation of the technology, which consists of two deep neural networks. The first network translates the text into a spectrogram (pdf), a visual way to represent audio frequencies over time. That spectrogram is then fed into WaveNet, a system from Alphabet’s AI research lab DeepMind, which reads the chart and generates the corresponding audio elements accordingly.

Tacotron 2 or Human?

In the following examples, one is generated by Tacotron 2, and one is the recording of a human, but which is which?
“That girl did a video about Star Wars lipstick.”
1
2
“She earned a doctorate in sociology at Columbia University.”
1
2
“George Washington was the first President of the United States.”
1
2
“I'm too busy for romance.”
1
2

Soundwave image by T-flex/Shutterstock.

Loading...