Cataloging the problems facing AI researchers is a cross between a parenting manual and a management book

Concrete Problems in AI Safety, an excellent, eminently readable paper from a group of Google AI researchers and some colleagues, sets out five hard problems facing the field: robots might damage their environments to attain their goals; robots might figure out how to cheat to attain their goals; supervising robots all the time is inefficient; robots that are allowed to try novel strategies might cause disasters; and robots that are good at one task might inappropriately try to apply that expertise to another unrelated task.

If these problems sound familiar, they should: they're a microcosm of the most common problems addressed in parenting and management books, and are frequently blamed for public disasters. For example, when mortgage brokers figured out how to win big commissions by mis-selling subprime mortgages and destroyed the global economy, they were "reward hacking." Adlerian parenting advice is all about finding ways to let kids learn from "logical consequences" and "natural consequences," without straying into territory where they can experience lasting harm -- just like robotics engineers want to allow cleaning robots to try random variance in strategy to get out of local maxima, but don't want them sticking wet mops in electrical sockets.

Robots that inappropriately apply expertise from one domain in another? That's Dunning-Kruger for AI.

The interesting thing is that AIs are pitched as a solution to many of the hard problems of managing humans -- we want self-driving cars because they won't be subject to the irrationality, impatience, and disattention of human drivers -- but those hard problems quickly resurface as we make even a little progress in that direction.

These are all forward thinking, long-term research questions -- minor issues today, but important to address for future systems:

Avoiding Negative Side Effects: How can we ensure that an AI system will not disturb its environment in negative ways while pursuing its goals, e.g. a cleaning robot knocking over a vase because it can clean faster by doing so?

Avoiding Reward Hacking: How can we avoid gaming of the reward function? For example, we don’t want this cleaning robot simply covering over messes with materials it can’t see through.

Scalable Oversight: How can we efficiently ensure that a given AI system respects aspects of the objective that are too expensive to be frequently evaluated during training? For example, if an AI system gets human feedback as it performs a task, it needs to use that feedback efficiently because asking too often would be annoying.

Safe Exploration: How do we ensure that an AI system doesn’t make exploratory moves with very negative repercussions? For example, maybe a cleaning robot should experiment with mopping strategies, but clearly it shouldn’t try putting a wet mop in an electrical outlet.

Robustness to Distributional Shift: How do we ensure that an AI system recognizes, and behaves robustly, when it’s in an environment very different from its training environment? For example, heuristics learned for a factory workfloor may not be safe enough for an office.

Concrete Problems in AI Safety [Dario Amodei et al/Arxiv]

Bringing Precision to the AI Safety Discussion
[Chris Olah/Google Research Blog]

(Image: I made a robot to help me argue on the internet
, Simone Giertz/Youtube