Comment by FooBarWidget

14 hours ago

Can anyone explain to me what a merge is and why that works? It seems utterly bizarre to me that you can just merge weights. You can't make a working program by just merging machine instruction pages. Aren't weights tightly coupled to a specific architecture?

1 comment

FooBarWidget

antonvs 14 hours ago

In this case both sets of weights ultimately came from the same model. The Nex model they used is a fine-time of Qwen, which was the other model they used.

I'm not an expert in this area, but it's not too hard to see how a merge like that could turn out ok.