Comment by pbmango

3 days ago

This is very interesting. I don't see much discussion of interpretability in day to the day discourse of AI builders. I wonder if everyone assumes it to either be solved, or to be too out of reach to bother stopping and thinking about.

Mostly out of reach. There is a ton of research on figuring out how to do this coming out every day, including both proposals of new ways to do things and (often strong) critiques of old or recently proposed ways of doing things. Interpretability (esp. for large, modern models) is very, very far from being a solved problem.

Most interpretability techniques haven't yet to be shown to be useful for everyday model pipelines. However, the field is working hard to change this.