Comment by gwern

1 month ago

My understanding was that Autoresearch was defined as training from scratch (since it's based on the nanogpt speedrun), not using any pretrained models. So it couldn't do anything like upcycling a pretrained model or the Frankenmerge, because it's not given any access to such a thing in the first place. (If it could, the speedrun would be pointless as it would mostly benchmark what is the fastest fileserver you can download a highly compressed pretrained model checkpoint from...) It can increase the number of layers for a new architecture+run, but that's not the same thing.

0 comments

gwern

No comments yet

Contribute on Hacker News ↗