Comment by yogurt-male

16 hours ago

Mostly out of reach. There is a ton of research on figuring out how to do this coming out every day, including both proposals of new ways to do things and (often strong) critiques of old or recently proposed ways of doing things. Interpretability (esp. for large, modern models) is very, very far from being a solved problem.