Comment by GaggiX

6 months ago

This model was trained on 6T tokens and has 256k embeddings, quite different than a gpt2 model comparable in size.

0 comments