Comment by sdpmas
5 hours ago
yes, good point. right now, it's somewhat hard to overfit because the meta-optimization extracts tiny bits of information. but over time, we will switch the validation set to some other random subset of the FineWeb or even entirely OOD datasets!
No comments yet
Contribute on Hacker News ↗