← Back to context Comment by throwaway314155 8 hours ago That doesn’t tell you if the new method continues to perform better at higher parameter counts. 1 comment throwaway314155 Reply amelius 7 hours ago Nor that the training from scratch will even work.
Nor that the training from scratch will even work.