Hacking into your robot vacuum to eavesdrop on your home

It turns out you can hack into a robot vacuum cleaner and use it sensors to eavesdrop.

How? With lasers! Spy agencies since the 1940s have eavesdropped on people using "laser microphones": You shine a laser at the window of a house, which vibrates every so faintly from the sound of whoever's inside, talking. You use the vibrations thusly recorded from the window to reconstitute precisely what the target was saying. Sneaky; clever.

These days, we've got plenty of devices shining lasers inside our houses — such as robotic vacuum cleaners. Many new models rapidly swivel a "LIDAR" laser around, bouncing the laser off objects in our house to help the robot navigate obstacles and walls.

So scientists wondered — huh, could you hack into a robot vacuum cleaner, record the laser info, and reconstitute our personal conversations based on the resulting vibrations coming off objects inside our house?

Indeed you can! A little bit, anyway. They figured out how to hack into a Xiaomi "Roborock" vacuum, and got it to recognize a few types of sound — including spoken digits (from "zero" to "nine"), and clips of the introductory music from news shows on networks like Fox and CNN. (Their paper describing the exploit is here.)

Why those particular forms of sound? Well, being able to eavesdrop on numbers could let you steal sensitive info like social security numbers, the researchers figured. And knowing what TV shows someone watches gives you a glimpse into "the victim's political orientation or viewing preferences".

Some layperson's details on how the technique worked, from the University of Maryland's press release:

… They conducted experiments with two sound sources. One source was a human voice reciting numbers played over computer speakers and the other was audio from a variety of television shows played through a TV sound bar. Roy and his colleagues then captured the laser signal sensed by the vacuum's navigation system as it bounced off a variety of objects placed near the sound source. Objects included a trash can, cardboard box, takeout container and polypropylene bag—items that might normally be found on a typical floor.

The researchers passed the signals they received through deep learning algorithms that were trained to either match human voices or to identify musical sequences from television shows. Their computer system, which they call LidarPhone, identified and matched spoken numbers with 90% accuracy. It also identified television shows from a minute's worth of recording with more than 90% accuracy.

Granted, this exploit isn't that easy, as yet. It only worked because the researchers pretrained a deep-learning model on sound clips of their target people speaking the digits "one" to "nine" — in the real world, of course, it might be trickier to get your hands on lots of training data of your target speaking.

But it's another reminder that hey, every "smart" device we own is also a full-fledged computer — usually outfitted with shoddy or nonexistent security, and thus super hackable — that is also loaded with sensors, providing any intruder with a metric truckload of data about our home habits.

But at least the carpet is clean!