Electrical Engineering Seminars

Provable Multiagent RL in Large State Space

Chi Jin Assistant Professor, Electrical and Computer Engineering, Princeton University.

Oct 15, 2021

Please register here, to attend the EE Seminars via Zoom this fall. Note, this is a new link from last year. Once you register, you will receive the Zoom link for all 2021-22 EE Seminar events. You only have to register once to receive a recurring Zoom link for the series. Please click "Add to calendar '' after signing up for Google or Outlook reminders!


Multiagent reinforcement learning systems have recently achieved significant success in many AI challenges. Two crucial components that contribute to these successes are function approximation and self-play. Function approximation is commonly deployed in modern applications with large state space to approximate either the value function or the policy, while self-play enables the learner to improve by playing against itself instead of human experts. While recent progresses in RL theory address a rich set of RL problems, such successes are mostly restricted to the single-agent setting with limited classes of function approximation. This talk presents our progress on addressing both challenges. We first propose a new algorithm that can provably find a near-optimal policy using a polynomial number of samples, for any single-agent RL problems with low Bellman-Eluder dimension---a generic complexity measure which captures a majority of existing tractable RL problems. We then extend the results to the setting of two-player zero-sum Markov games, propose a new self-play algorithm that can provably find the Nash equilibrium policy using a polynomial number of samples. A key component of our self-play algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness.

Speaker Bio

Chi Jin is assistant professor of Electrical and Computer Engineering at Princeton University.  He obtained his Ph.D. in Computer Science at UC Berkeley, advised by Michael I. Jordan. He received his B.S. in Physics from Peking University. His research interest lies in theoretical machine learning, with special emphasis on nonconvex optimization and reinforcement learning. His representative work includes proving noisy gradient descent / accelerated gradient descent escape saddle points efficiently, proving sample complexity bounds for optimistic Q-learning / Least-squares value iteration, and designing near-optimal algorithms for minimax optimization.


Na Li


Shalom Okafor