Published on February 28, 2018

There are many types of interpretability, from identifying influential features and data points to learning disentangled representations. Which of these are the most relevant for building safe AI systems? We will examine how different safety problems benefit from different types of interpretability, and what questions interpretability researchers can focus on to contribute to advancing AI safety.
Recorded: December 9th, 2017

