Comment by DonsDiscountGas
1 day ago
I didn't know model merging like that was possible. (Obviously possible from a pure software standpoint but I'm surprised it's effective)
1 day ago
I didn't know model merging like that was possible. (Obviously possible from a pure software standpoint but I'm surprised it's effective)
As another poster above linked, it’s been shown to be effective since 2022: https://arxiv.org/abs/2203.05482
it works because Nex N2 is also a derivative of the original base Qwen model. If it was two completely unrelated models it wouldn't work.
A few years back these used to be called "Frankenstein models"
Even merging models with themselves as shown here in the post how they got to the top of hugging face with two gpus