Comment by libraryofbabel

6 months ago

The papers from Anthropic on interpretability are pretty good. They look at how certain concepts are encoded within the LLM.

0 comments