Microsoft takes down MS Celeb facial recognition database, 10 million+ pics of ~100,000 faces, maybe yours, scraped under Creative Commons

It's very bad that this existed, and still does, away from public view. They did it quietly, and only after the Financial Times shamed them over it. But it's still good news.

Microsoft has taken down its online database of 10 million or more human faces.

Maybe yours.

The 'MS Celeb' database was first published on the internet in 2016, and Microsoft claimed it was the world's largest publicly available facial recognition data set, containing over 10 million images of nearly 100,000 individual people.

Microsoft used the facial data to train facial recognition systems, including those used by U.S. military researchers, and by various firms in China — SenseTime and Megvii among them.

China uses facial recognition to commit mass human rights abuses against minority populations including the predominantly Muslim Uighur people, and the ethnically Tibetan people who live in the region China calls its Tibet Autonomous Region, and the rest of us call China-occupied Tibet.

Stanford and Duke universities also removed facial recognition data after the publication of work by Berlin-based security researcher Adam Harvey. His Megapixels project documents many large data sets, how they are used, and what's at stake for your privacy.

Here's Madhumita Murgia, writing for The Financial Times:

The people whose photos were used were not asked for their consent, their images were scraped off the web from search engines and videos under the terms of the Creative Commons license that allows academic reuse of photos.

Microsoft, which took down the database days after the FT reported on its use by companies, said: "The site was intended for academic purposes. It was run by an employee that is no longer with Microsoft and has since been removed."

Two other data sets have also been taken down since the FT report was published in April, including the Duke MTMC surveillance data set built by Duke University researchers, and a Stanford University data set called Brainwash.

Brainwash used footage of customers in a café called Brainwash in San Francisco's Lower Haight district, taken through a livestreaming camera. Duke did not respond to requests for comment. Stanford said it had removed the data set after a request by one of the authors of a study it was used for. A spokesperson said the university is "committed to protecting the privacy of individuals at Stanford and in the larger community".

All three data sets were uncovered by Berlin-based researcher Adam Harvey, whose project Megapixels documented the details of dozens of data sets and how they are being used.

Microsoft's MS Celeb data set has been used by several commercial organisations, according to citations in AI papers, including IBM, Panasonic, Alibaba, Nvidia, Hitachi, Sensetime and Megvii. Both Sensetime and Megvii are Chinese suppliers of equipment to officials in Xinjiang, where minorities of mostly Uighurs and other Muslims are being tracked and held in internment camps.

Microsoft quietly deletes largest public face recognition data set [FT.com via Techmeme.com]

Microsoft, Duke and Stanford unis have quietly deleted large datasets of people's faces that were pulled from the internet without their consent after the FT reported on their commercil usage in April – https://t.co/gWL3kUc30K

— Madhumita Murgia (@madhumita29) June 6, 2019

Microsoft quietly deletes largest public face recognition data set, Stanford and Duke uni also remove facial recognition data. "You can't make a data set disappear. Once you post it, and people download it, it exists on hard drives all over the world" https://t.co/z2PQoc9i6z

— Robert Went (@went1955) June 6, 2019

If you are working on facial recognition technology, you are almost certainly enabling repressive regimes, either now or in the future. https://t.co/EneMAamY3g

— Ross Grady (@rossgrady) June 6, 2019

I guess "nothing to hide, nothing to fear" is one-way traffic: @Microsoft quietly deletes largest public face recognition data set, @Stanford and @DukeU also remove their #facialrecognition data sets https://t.co/sjAF9gwtUj

— Stephanie Hare (@hare_brain) June 6, 2019

Microsoft included a photo of the face of Shoshana Zuboff, the author of "The Age Of Surveillance Capitalism," in a database used to train the facial recognition systems deployed in China's dystopian surveillance of Uighur Muslims https://t.co/mjp0uloOsC

— Tom Gara (@tomgara) June 6, 2019

U.S. tech giant Microsoft has quietly pulled database of 10 million faces, showing nearly 100,000 individuals, from the internet, @madhumita29 reports. Images have been used to train systems around the world, including by military and Chinese companies. https://t.co/H62YHBqSpW

— Janosch Delcker (@JanoschDelcker) June 6, 2019

Excellent news: Microsoft quietly deletes largest public face recognition data set. Nice to see some principled leadership in tech https://t.co/qnl3PzAIfm via @financialtimes

— Margaret Heffernan (@M_Heffernan) June 6, 2019

Microsoft quietly deleted the world's largest public facial recognition data set. It was being used by Megvii and SenseTime, two Chinese facial recognition companies who have contracts and business relationships in Xinjiang. https://t.co/UPanZcuAKF

— Ryan Mac (@RMac18) June 6, 2019