Please register here to join us for the CS Colloquium Series 2020-21 via Zoom. Once you register, you will receive a recurring Zoom link. You only need to register once to be able to attend any of the seminars. 

---

Datasets in the wild are rarely captured by the clean models on which learning theory is built, yet we run our algorithms on them all the same, with no guarantee that their performance will degrade gracefully as we stray from the models originally posited. To complicate matters further, modern ML tasks are typically high-dimensional, and data often arrives in a dynamic, non-stationary fashion. In these settings, designing learning algorithms that work under even the cleanest of models can be challenging, let alone ones that are resilient to the pathologies of real-world data.

This talk will center around developing provable algorithms for handling all of these issues simultaneously. I will instantiate these techniques in the context of regression and contextual bandits, yielding the first efficient and practical algorithms to achieve optimal error guarantees even when a constant fraction of the data has been corrupted. These results provably cannot be obtained using off-the-shelf techniques like Huber regression or least trimmed squares. And importantly, unlike existing guarantees in algorithmic robust statistics, ours do not posit any model for the data-generating process. Our results represent an appealing instance where ideas from theory can lead to genuinely new primitives that perform well on real-world data.