Comment by naasking
3 days ago
What's unusual about it? It seems pretty standard to train small models to validate an approach, and then show that training scales with model size to 8B to 14B parameter models, which is what they did.
3 days ago
What's unusual about it? It seems pretty standard to train small models to validate an approach, and then show that training scales with model size to 8B to 14B parameter models, which is what they did.
No comments yet
Contribute on Hacker News ↗