Comment by nl
16 hours ago
> The analogy falls apart very quickly. Without the training data, your modifications amount to virtually nothing compared to what these "versions" are, and the idea that you can maintain and improve on these models without the continual support of the company that owns the training data AND harnesses AND in general build instructions is not very credible.
This is completely wrong, and sort of shows why what you are saying is not a problem at all.
You can post-train any LLM very easily without access to the original training data.
People do it all the time.
Cursor post-training Kimi K2 is a great example.
> If Qwen decides to stop distributing models for download, you're basically stuck, _even_ if you have unlimited resources, it's not clear how the released weights help you; your best bet is to start almost from scratch.
What are you talking about? You just post-train it.
There is exactly zero different before and after they stop distributing it. People don't have access to the training data now (when they are distributing it) and post train very successfully.
What would you even use the training data for?
> You can post-train any LLM very easily without access to the original training data.
Are you claiming this is e.g. what Alibaba spends their time doing?
My point is that the usefulness of this is limited _in comparison to the one provided by having their training data AND mechanisms_.