Comment by rao-v
9 hours ago
To the previous poster's point, soft distributions are useful, even saving the top 10 logits is significantly more training signal than just the final token.
9 hours ago
To the previous poster's point, soft distributions are useful, even saving the top 10 logits is significantly more training signal than just the final token.
No comments yet
Contribute on Hacker News ↗