Comment by unleaded

10 days ago

ITT nobody remembers gpt2 anymore and that makes me sad

1 comment

unleaded

This model was trained on 6T tokens and has 256k embeddings, quite different than a gpt2 model comparable in size.