Comment by FooBarWidget
14 hours ago
Can anyone explain to me what a merge is and why that works? It seems utterly bizarre to me that you can just merge weights. You can't make a working program by just merging machine instruction pages. Aren't weights tightly coupled to a specific architecture?
In this case both sets of weights ultimately came from the same model. The Nex model they used is a fine-time of Qwen, which was the other model they used.
I'm not an expert in this area, but it's not too hard to see how a merge like that could turn out ok.