Comment by nickysielicki
14 days ago
It scales depending on the dataset you want exposure on and the compute you have available, so any specific time box is kind of meaningless if you don’t know the rest of the inputs that went into it. The llama 3 paper went into a lot of this and how these decisions were made (see section 3 and onward): https://ai.meta.com/research/publications/the-llama-3-herd-o...
tl;dr: llama 3 was 54 days, but it’s more complicated than that.
No comments yet
Contribute on Hacker News ↗