Comment by antonvs
15 hours ago
In this case both sets of weights ultimately came from the same model. The Nex model they used is a fine-time of Qwen, which was the other model they used.
I'm not an expert in this area, but it's not too hard to see how a merge like that could turn out ok.
No comments yet
Contribute on Hacker News ↗