The Value Alignment Problem in Artificial Intelligence
Monday, April 13, 2020 12pm to 1:15pm
About this Event
Much of our success in artificial intelligence stems from the adoption of a simple paradigm: specify an objective or goal, and then use optimization algorithms to identify a behavior (or predictor) that optimally achieves this goal. This has been true since the early days of AI (e.g., search algorithms such as A* that aim to find the optimal path to a goal state), and this paradigm is common to AI, statistics, control theory, operations research, and economics. Loosely speaking, the field has evaluated the intelligence of an AI system by how efficiently and effectively it optimizes for its objective. This talk will provide an overview of my thesis work, which proposes and explores the consequences of a simple, but consequential, shift in perspective: we should measure the intelligence of an AI system by its ability to optimize for our objectives.
In an ideal world, these measurements would be the same -- all we have to do is write down the correct objective! This is easier said than done: misalignment between the behavior a system designer actually wants and the behavior incentivized by the reward or loss functions they specify is routine, it is commonly observed in a wide variety of practical applications, and fundamental, as a consequence of limited human cognitive capacity. This talk will build up a formal model of this value alignment problem as a cooperative human-robot interaction: an assistance game of partial information between a human principal and an autonomous agent. It will begin with a discussion of a simple instantiation of this game where the human designer takes one action, write down a proxy objective, and the robot attempts to optimize for the true objective by treating the observed proxy as evidence about the intended goal. Next, I will generalize this model to introduce Cooperative Inverse Reinforcement Learning, a general and formal model of this assistance game, and discuss the design of efficient algorithms to solve it. The talk will conclude with a discussion of directions for further research including applications to content recommendation and home robotics, the development of reliable and robust design environments for AI objectives, and the theoretical study of AI regulation by society as a value alignment problem with multiple human principals.