Interpreting Training

Naomi Saphra

Mar 20, 2023
9:00 am to 10:00 am | Remote

Interpretability research in natural language processing often follows a predictable pattern-pick an indicator of structure or knowledge such as probe or challenge set accuracy, measure that indicator in a fully trained model, and assert that this structure or information is integral to how the model functions. However, we can achieve a much deeper understanding by considering how these indicators emerge from the training process, from apparently homogenous scaling laws to abrupt phase transitions to the significant impact of random factors.


Kempner Institute


Deborah Apsel Lang