Comment by maxloh

1 day ago

I think we should stop calling this type of models open source. They are indeed "open weight." The training code is proprietary and never revealed.

https://github.com/microsoft/VibeVoice/issues/102

70 comments

maxloh

jcmfernandes 1 day ago

Indeed. We now live in a world where freeware is named open source. We are very sorry, Stallman.

MarsIronPI 1 day ago
If you're going to apologize to Stallman, you should apologize for conflating open source with software freedom. ;D
- jcmfernandes 1 day ago
  
  I totally get you, but this is yet another thick layer away.
- psychoslave 1 day ago
  
  With free libre software, where freedom and liberty are about what the end user is empowered with actually, the software is mostly metonymic. Free software, free society, because there are free people in the middle of course.
  
  9 replies →

simonw 1 day ago

I'm reserving that complaint for "open source" models which are released under non-open-source licenses.

I care that I know what I can DO with the project when I see it described as "open source".

yjftsjthsd-h 1 day ago
> I care that I know what I can DO with the project when I see it described as "open source".
Yes, the first of which is that you should be able to build it from source. Which requires the source code, and in this case data.
- simonw 1 day ago
  
  The OSI's take on this is that an open source model can be modified through fine-tuning etc, even if you can't rebuild it from scratch.
  The problem with requiring "build from scratch" for open source models is that the number of interesting models with training data that can be openly licensed is close to zero.
  If you trained your model on an unlicensed scrape of the web you can't release the data under an open source license!
  The Open Source Initiative have a bunch of their thinking around this in their FAQ for the "Open Source AI definition": https://opensource.org/ai/faq#isn-t-training-data-required-t...
  
  4 replies →
- rogerrogerr 1 day ago
  
  They’ll never reveal the data, because that would reveal this is all built on stolen work.
  
  1 reply →
data-ottawa 1 day ago

That would be “permissive license”
Maybe we should have a little cue card for models: vendor/name, size, open weights, open source, permissive license.
It’s simple enough an idea.

JumpCrisscross 1 day ago

> we should stop calling this type of model open source. They are indeed "open weight”

This ship has sailed. It’s now in the same category as hacker/cracker and the pronunciation of GIF.

andy_ppp 1 day ago

I think you mean GIF.
engeljohnb 21 hours ago
The inventor of GIF didn't begin with a document* clearly laying out what is and isn't to be called a "GIF."
I think it's right to push back whenever a huge tech corporation tries to build goodwill by falsely using terms like "open source."
*https://opensource.org/osd
- keeda 19 hours ago
  
  To be fair, the initiators of the "Open Source" movement also co-opted a term that previously had a much more flexible meaning (and had been around for more than a decade at that point.) Just writing a document attributing specific criteria to a term does not grant one authority over the use of that term.
  Ironically, the roots of the Open Source movement are a direct reponse to the Free Software movement largely because it was considered too ideological and unfriendly to corporate interests (i.e. monetization.)
- JumpCrisscross 21 hours ago
  
  > inventor of GIF didn't begin with a document clearly laying out what is and isn't to be called a "GIF”*
  Neither did the inventors of AI. A third party published a document after corporations went with open weights = open source and a spoiler block in FOSS wanted all training data published.
  > it's right to push back whenever a huge tech corporation tries to build goodwill by falsely using terms like "open source
  I think it’s counterproductive. Most people only see a squabble, which makes any ensuing points from the open-source community seem silly. Those who care can continue using the more-precise language they choose to.
  Put another way, there is a difference between using terms like cracker and fully spelling out cryptocurrency, and telling people who use hacker and crypto more loosely that they’re wrong. They aren’t wrong and that isn’t meaningful feedback. At the same time, the person using the precise language isn’t wrong either.
  
  1 reply →
giancarlostoro 1 day ago
It's the same as GIS, you wouldn't say jizz now would you?
- DoctorOW 1 day ago
  
  I absolutely do, every single time it comes up.
- ziml77 1 day ago
  
  I hadn't thought about how to pronounce GIS, but do you have a problem with the pronunciation of the Japanese Industrial Standards: JIS?
  
  3 replies →
- dijksterhuis 1 day ago
  
  i am absolutely going to from now on
- kevin_thibedeau 1 day ago
  
  The developer of the format declared the pronunciation 30+ years ago. It has always been jif.
  
  1 reply →
- notabotiswear 1 day ago
  
  I take it that you haven’t met the Arcgees people…
- pardon_me 1 day ago
  
  How do you pronounce giraffe?
  
  8 replies →
WarmWash 1 day ago
And "hallucination" which should have been "delusion".
Way early on (spring 2023) people tried to stop it, but no luck.
- MagicMoonlight 1 day ago
  
  Why would it be delusion? It’s making something up which isn’t there and describing it.
  
  1 reply →

WhyNotHugo 1 day ago

Devils advocate here: I can give you a binary of my open source MIT code and never phone you the code. The code is still MIT licensed, and open source. You just have no access to it.

That said, I entirely agree that MS is misrepresenting their openness here, which isn’t in the least surprising.

Otek 1 day ago
? Do you know what “source” means in open source? Like, what is the source of the binary? It’s the code. That’s the source in open source.
- freedomben 21 hours ago
  
  I don't disagree, but it is perfectly acceptable per the MIT license, which is an OSI approved license. MIT doesn't require source distribution with the binary (which is why from the developer perspective, it's a more "permissive" license)
  
  3 replies →
freedomben 1 day ago

In their defense, most everyone else does the same thing. They still shouldn't do it, but at least they're not the trendsetter here (though they are contributing to the ongoing problem)

btown 1 day ago

At least it's MIT licensed! As much as non-open training data irks me, restrictive licensing irks me more!

cute_boi 1 day ago

what is problem with restrictive licensing? Most of them starts if you have 1M users etc?

bitvvip 1 day ago

What you said makes a lot of sense. Free software should not be confused with open source

giancarlostoro 1 day ago

I mean, you have "AI" which means just about anything in marketing speak, "Agentic" is kind of becoming similar, hopefully they don't goof that one too badly, would be nice to know what you are trying to sell me. Used to be "Cloud" meant storage not just hosting (I guess it still does).

Then there's "Smart" in front of Car, Phone, TV, and so on... Meaning different things.

I do think "Open Weight" should be more commonly used. There's definitely communities that spring up that build the training infrastructure and inference infrastructure around open models on the other hand.

scotty79 1 day ago

Open weights is not exactly right either because we do get source of the software that uses those open weights.

Maybe open inference?

But we often also get source code for fine tunning the model.

So maybe it's closer to open source than to anything else?

Isn't it a bit like not calling a game open source because engine tooling used to made it isn't open source and they didn't publish .psd files with asset designs?

jrm4 1 day ago

I'm genuinely torn on this one; I get technically why not, but why I think I have no problem with it is the wishy-washiness of "open source" generally.

As I teach this stuff to people newer to this tech, it's probably just easier and more helpful to refer to the wide array of "stuff you can just download and use yourself" as "open-source" and then after that, go deeper and talk about why Stallman was right, how "Free Software" was first. etc.

ilqr_jb 21 hours ago

[dead]

notabotiswear 1 day ago

Openwashing is the new greenwashing, which, coincidently, seems to have gone out of fashion a few hundred datacentres ago.

dist-epoch 1 day ago
it was replaced with abundancewashing
- Geezus_42 1 day ago
  
  What is "abundancewashing"?
  
  1 reply →