About this Event
33 Oxford Street, Cambridge, MA 02138
The ability to continuously learn and generalize to new problems quickly is a hallmark of general intelligence. While deep learning has driven progress in many natural language processing tasks, existing language models still require a lot of in-domain training examples (i.e., input/output pairs that are often costly to annotate), overfit to the idiosyncrasies of particular datasets, and forget previously learned knowledge when learning from new datasets. In contrast, humans are able to learn incrementally and accumulate task-agnostic knowledge to facilitate faster learning of new skills without forgetting old ones.
In this talk, I will argue that obtaining such an ability for a language model requires significant advances in how to represent, store, and reuse knowledge acquired from textual data. I will present two methods in this direction: (i) an episodic memory module that allows a neural network language model to continually learn without forgetting even when it encounters shifts in the data distribution (e.g., when switching from news articles to social media posts); and (ii) a self-supervised learning framework that unifies classical and modern word representation learning models---which have been the main driver of progress in transfer learning in natural language processing---and connects them to analogous methods used in other domains (e.g., computer vision, audio processing). I will conclude by briefly discussing a series of future research programs toward building a general linguistically intelligent agent.