Comment by highfrequency
10 days ago
Interesting that for these small models, it is optimal for the embedding parameters to be a huge fraction of the total (170e6/250e6) = 68%!
10 days ago
Interesting that for these small models, it is optimal for the embedding parameters to be a huge fraction of the total (170e6/250e6) = 68%!
No comments yet
Contribute on Hacker News ↗