Comment by IronyMan100
5 days ago
Does this Not make sense? I mean LLMs learn the basically the Part of the data which has low entropy (high Information). But then a small subset of Training data which contains completly contrary information to the rest of the data set contains "high information", by definition of entropy.
No comments yet
Contribute on Hacker News ↗