Comment by musebox35

8 hours ago

Training isn’t a single homogeneous step. It starts with pretraining which requires bulk PB of data but you have less quality concerns here. You cover the whole data distribution. Later stages require less and less but increasingly higher quality and complex datasets. The late stage ones are highly curated and might even be sourced from world subject experts. This is where frontier labs with big pockets have the advantage.