Comment by hackpert
2 days ago
These metaphorical database analogies bug me, and from what it seems like, a lot of other people in comments! So far some of the most reasonable explanations I have found that take training dynamics into account are from Lenka Zdeborova's lab (albeit in toy, linear attention settings but it's easy to see why they generalize to practical ones). For instance, this is a lovely paper: https://arxiv.org/abs/2509.24914
No comments yet
Contribute on Hacker News ↗