Comment by GaggiX
10 days ago
This model was trained on 6T tokens and has 256k embeddings, quite different than a gpt2 model comparable in size.
10 days ago
This model was trained on 6T tokens and has 256k embeddings, quite different than a gpt2 model comparable in size.
No comments yet
Contribute on Hacker News ↗