Comment by pornel
3 months ago
Keep in mind that the "experts" are selected per layer, so it's not even a single expert selection you can correlate with a token, but an interplay of abstract features across many experts at many layers.
3 months ago
Keep in mind that the "experts" are selected per layer, so it's not even a single expert selection you can correlate with a token, but an interplay of abstract features across many experts at many layers.
No comments yet
Contribute on Hacker News ↗