Comment by janalsncm
1 month ago
I don’t disagree, but it’s worth having a look at the changes the LLM did apply.
https://github.com/karpathy/autoresearch/blob/master/progres...
My opinion is you’d have to go pretty far down the x axis to get to anything that’s not things like tinkering with bs, lr, or positional encodings. There are so many hyperparameter knobs already exposed that duplicating layers is unlikely to be proposed for a long time.
I also just noticed that the last change it applied was changing the random seed. Lol.
My understanding was that Autoresearch was defined as training from scratch (since it's based on the nanogpt speedrun), not using any pretrained models. So it couldn't do anything like upcycling a pretrained model or the Frankenmerge, because it's not given any access to such a thing in the first place. (If it could, the speedrun would be pointless as it would mostly benchmark what is the fastest fileserver you can download a highly compressed pretrained model checkpoint from...) It can increase the number of layers for a new architecture+run, but that's not the same thing.