Comment by andy12_

3 days ago

This seems really interesting. While Anthropic tried to use dictionary learning over an existing model to try to extract concepts, this almost feels like training the model alongside the dictionary itself (or rather, the model and the dictionary are intertwined).