← Back to context

Comment by b65e8bee43c2ed0

9 hours ago

doesn't it get tiring after a while? using the same (perceived) gotcha, over and over again, for three years now?

no one is ever going to release their training data because it contains every copyrighted work in existence. everyone, even the hecking-wholesome safety-first Anthropic, is using copyrighted data without permission to train their models. there you go.

There is an easy fix already in widespread use: "open weights".

It is very much a valuable thing already, no need to taint it with wrong promise.

Though I disagree about being used if it was indeed open source: I might not do it inside my home lab today, but at least Qwen and DeepSeek would use and build on what eg. Facebook was doing with Llama, and they might be pushing the open weights model frontier forward faster.

  • > There is an easy fix already in widespread use: "open weights"

    They're both correct given how the terms are actually used. We just have to deduce what's meant from context.

    There was a moment, around when Llama was first being released, when the semantics hadn't yet set. The nutter wing of the FOSS community, to my memory, put forward a hard-line and unworkable definition of open source and seemed to reject open weights, too. So the definition got punted to the closest thing at hand, which was open weights with limited (unfortunately, not no) use restrictions. At this point, it's a personal preference that's at most polite to respect if you know your audience has one.

  • Yeah, open weights is really good, especially when base models (not just the instruction tuned) weights are released like here.