Comment by qoez
11 days ago
From a core openai insider who have likely trained very large markov models and large transformers: https://x.com/unixpickle/status/1935011817777942952
Untwittered: A Markov model and a transformer can both achieve the same loss on the training set. But only the transformer is smart enough to be useful for other tasks. This invalidates the claim that "all transformers are doing is memorizing their training data".
No comments yet
Contribute on Hacker News ↗