Comment by phire
13 hours ago
The NLA also hallucinates, so it's still not revealing the models actual "thoughts" of the model; The paper also points out that since the NLA is a full LLM, it can make inferences that aren't actually in the activations.
But it's a useful approximation for auditing.
No comments yet
Contribute on Hacker News ↗