Comment by ketchup32613
21 hours ago
Do you want to see scaling curves wrt data and param size? I agree that 1.2B and 10B tokens is not representative, but what scale of parameters and dataset sizes would be convincing?
21 hours ago
Do you want to see scaling curves wrt data and param size? I agree that 1.2B and 10B tokens is not representative, but what scale of parameters and dataset sizes would be convincing?
Not to sound facetious, but perhaps enough runs at different param/token sizings to define a curve?