Comment by moffkalast
2 months ago
> trained from scratch on 80B tokens of historical data
How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?
2 months ago
> trained from scratch on 80B tokens of historical data
How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?
No comments yet
Contribute on Hacker News ↗