About this Event
Interpretability research in natural language processing often follows a predictable pattern-pick an indicator of structure or knowledge such as probe or challenge set accuracy, measure that indicator in a fully trained model, and assert that this structure or information is integral to how the model functions. However, we can achieve a much deeper understanding by considering how these indicators emerge from the training process, from apparently homogenous scaling laws to abrupt phase transitions to the significant impact of random factors.