Researchers think that adversarial examples could help us maintain privacy from machine learning systems

Machine learning systems are pretty good at finding hidden correlations in data and using them to infer potentially compromising information about the people who generate that data: for example, researchers fed an ML system a bunch of Google Play reviews by reviewers whose locations were explicitly given in their Google Plus reviews; based on this, the model was able to predict the locations of other Google Play reviewers with about 44% accuracy. Read the rest

Rage Inside the Machine: an insightful, brilliant critique of AI's computer science, sociology, philosophy and economics

[I ran a review of this in June when the UK edition came out -- this review coincides with the US edition's publication]

Rob Smith is an eminent computer scientist and machine learning pioneer whose work on genetic algorithms has been influential in both industry and the academy; now, in his first book for a general audience, Rage Inside the Machine: The Prejudice of Algorithms, and How to Stop the Internet Making Bigots of Us All, Smith expertly draws connections between AI, neoliberalism, human bias, eugenics and far-right populism, and shows how the biases of computer science and the corporate paymasters have distorted our whole society. Read the rest

Announcement of Tumblr's sale to WordPress classified as pornography by Tumblr's notorious "adult content" filter

Tumblr is being sold to WordPress parent company Automattic for a reported price of "less than $3m," a substantial decline from the $1.1B Yahoo paid for the company in 2013 (Yahoo subsequently sold Tumblr and several other startups it had overpaid for and then ruined to Verizon for more than $4b). Read the rest

Training bias in AI "hate speech detector" means that tweets by Black people are far more likely to be censored

More bad news for Google's beleaguered spinoff Jigsaw, whose flagship project is "Perspective," a machine-learning system designed to catch and interdict harassment, hate-speech and other undesirable online speech. Read the rest

Ad copy written with AI outperformed human-written copy

Which ad copy for a banking service is more effective?

A) “Access cash from the equity in your home.”

or

B) “It’s true—You can unlock cash from the equity in your home.”

If you answered B, you are correct.  It did better with Chase Bank customers than A did.

Answer B was written by a machine learning language model developed by Persado, "a New York-based company that applies artificial intelligence to marketing creative," according to Ad Age.

From the article:

Kristin Lemkau, chief marketing officer of JPMorgan Chase, noted that machine learning can actually help achieve more humanity in marketing. “Persado’s technology is incredibly promising,” she said in a statement. “It rewrote copy and headlines that a marketer, using subjective judgment and their experience, likely wouldn’t have.”

Chase plans to use Persado for the ideation stage of creating marketing copy on display ads, Facebook ads and in direct mail, according to Yuval Efrati, chief customer officer at seven-year-old Persado. He says that the AI company works alongside Chase’s marketing team and its agencies.

[via Digg]

Image: Shutterstock (modified) Read the rest

Open archive of 240,000 hours' worth of talk radio, including 2.8 billion words of machine-transcription

A group of MIT Media Lab researchers have published Radiotalk, a massive corpus of talk radio audio with machine-generated transcriptions, with a total of 240,000 hours' worth of speech, marked up with machine-readable metadata. Read the rest

"Intellectual Debt": It's bad enough when AI gets its predictions wrong, but it's potentially WORSE when AI gets it right

Jonathan Zittrain (previously) is consistently a source of interesting insights that often arrive years ahead of their wider acceptance in tech, law, ethics and culture (2008's The Future of the Internet (and how to stop it) is surprisingly relevant 11 years later); in a new long essay on Medium (shorter version in the New Yorker), Zittrain examines the perils of the "intellectual debt" that we incur when we allow machine learning systems that make predictions whose rationale we don't understand, because without an underlying theory of those predictions, we can't know their limitations. Read the rest

Scite: a tool to find out if a scientific paper has been supported or contradicted since its publication

The Scite project has a corpus of millions of scientific articles that it has analyzed with deep learning tools to determine whether any given paper has been supported or contradicted by subsequent publications; you can check Scite via the website, or install a browser plugin version (Firefox, Chrome). (Thanks, Josh!) Read the rest

A generalized method for re-identifying people in "anonymized" data-sets

"Anonymized data" is one of those holy grails, like "healthy ice-cream" or "selectively breakable crypto" -- if "anonymized data" is a thing, then companies can monetize their surveillance dossiers on us by selling them to all comers, without putting us at risk or putting themselves in legal jeopardy (to say nothing of the benefits to science and research of being able to do large-scale data analyses and then publish them along with the underlying data for peer review without posing a risk to the people in the data-set, AKA "release and forget"). Read the rest

Eminent psychologists condemn "emotion detection" systems as being grounded in junk science

One of the more extravagant claims made by tech companies is that they can detect emotions by analyzing photos of our faces with machine learning systems. The premise is sometimes dressed up in claims about "micro-expressions" that are below the threshold of human detection, though some vendors have made billions getting security agencies to let them train officers in "behavior detection" grounded in this premise. Read the rest

Interactive map of public facial recognition systems in America

Evan Greer from Fight for the Future writes, "Facial recognition might be the most invasive and dangerous form of surveillance tech ever invented. While it's been in the headlines lately, most of us still don't know whether it's happening in our area. My organization Fight for the Future has compiled an interactive map that shows everywhere in the US (that we know of) facial recognition being used -- but also where there are local efforts to ban it, like has already happened in San Francisco, Oakland, and Somerville, MA. We've also got a tool kit for local residents who want to get an ordinance or state legislation passed in their area." Read the rest

Many of the key Googler Uprising organizers have quit, citing retaliation from senior management

The Googler Uprising was a string of employee actions within Google over a series of issues related to ethics and business practices, starting with the company's AI project for US military drones, then its secretive work on a censored/surveilling search tool for use in China; then the $80m payout to Android founder Andy Rubin after he was accused of multiple sexual assaults. Read the rest

China's AI industry is tanking

In Q2 2018, Chinese investors sank $2.87b into AI startups; in Q2 2019, it was $140.7m. Read the rest

AI is like a magic trick: amazing until it goes wrong, then revealed as a cheap and brittle effect

I used to be on the program committee for the O'Reilly Emerging Technology conferences; one year we decided to make the theme "magic" -- all the ways that new technologies were doing things that baffled us and blew us away. Read the rest

Computerphile explains the fascinating AI storyteller, GPT-2

GPT-2 is a language model that was trained on 40GB of text scraped from websites that Reddit linked to and that had a Karma score of at least two.  As the developers at OpenAI describe it, GPT-2 is "a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization—all without task-specific training." Because the model is probabilistic, it returns a different response every time you enter the same input.

OpenAI decided not to release the 40GB-trained model, due to "concerns about malicious applications of the technology" but it released a 345MB-trained model which you can install as a Python program and run from a command line. (The installation instructions are in the DEVELOPERS.md file.) I installed it and was blown away by the human-quality outputs it gave to my text prompts. Here's an example - I prompted it with the first paragraph of Kafka's The Metamorphosis. And this is just with the tiny 345MB model. OpenAI published a story that the 40G GPT-2 wrote about unicorns, which shows how well the model performs.

In this Computerphile video, Rob Miles of the University of Nottingham explains how GPT-2 works. Read the rest

Make: a machine-learning toy on open-source hardware

In the latest Adafruit video (previously) the proprietors, Limor "ladyada" Friend and Phil Torrone, explain the basics of machine learning, with particular emphasis on the difference between computing a model (hard) and implementing the model (easy and simple enough to run on relatively low-powered hardware), and then they install and run Tensorflow Light on a small, open-source handheld and teach it to distinguish between someone saying "No" and someone saying "Yes," in just a few minutes. It's an interesting demonstration of the theory that machine learning may be most useful in tiny, embedded, offline processors. (via Beyond the Beyond) Read the rest

Using machine learning to pull Krazy Kat comics out of giant public domain newspaper archives

Joël Franusic became obsessed with Krazy Kat, but was frustrated by the limited availability and high cost of the books anthologizing the strip (some of which were going for $600 or more on Amazon); so he wrote a scraper that would pull down thumbnails from massive archives of pre-1923 newspapers and then identified 100 pages containing Krazy Kat strips to use as training data for a machine-learning model. Read the rest

More posts