Comment by weregiraffe

3 months ago

Is the training data open-source? And can you validate that the model was trained on the claimed training data alone? Without this, all benchmarks are useless.

Olmo author here! we release all training data and all our training scripts, plus intermediate checkpoints, so you could take a checkpoint, reproduce a few steps on the training data, and check if loss matches.

it’s no cryptography proof, and you can’t get perfect determinism on nvidia GPUs, but it’s pretty close.