← Back to context Comment by throwaway314155 4 hours ago That doesn’t tell you if the new method continues to perform better at higher parameter counts. 1 comment throwaway314155 Reply amelius 3 hours ago Nor that the training from scratch will even work.
Nor that the training from scratch will even work.