Comment by tarruda

10 months ago

Is there any practical method to verify that the model was trained from the reported dataset?

we released 81 intermediate checkpoints of the whole pretraining phase, and the code and data to reproduce. so full audit is surely possible - still it would depend on what you consider 'practical' here.