Comment by skissane
5 hours ago
> You can't guarantee that with ai because who knows which training data was used
There are no guarantees in life, but with macOS you can know it is rather unlikely any AI was trained on (recent) Apple proprietary source code – because very little of it has been leaked to the general public – and if it hasn't leaked to the general public, the odds are low any mainstream AI would have been trained on it. Now, significant portions of macOS have been open-sourced – but presumably it is okay for you to use that under its open source license – and if not, you can just compare the AI-generated code to that open source code to evaluate similarity.
It is different for Windows, because there have been numerous public leaks of Windows source code, splattered all over GitHub and other places, and so odds are high a mainstream AI has ingested that code during training (even if only by accident).
But, even for Windows – there are tools you can use to compare two code bases for evidence of copying – so you can compare the AI-generated reimplementation of Windows to the leaked Windows source code, and reject it if it looks too similar. (Is it legal to use the leaked Windows source code in that way? Ask a lawyer–is someone violating your copyright if they use your code to do due diligence to ensure they're not violating your copyright? Could be "fair use" in jurisdictions which have such a concept–although again, ask a lawyer to be sure. And see A.V. ex rel. Vanderhye v. iParadigms, L.L.C., 562 F.3d 630 (4th Cir. 2009))
In fact, I'm pretty sure there are SaaS services you can subscribe to which will do this sort of thing for you, and hence they can run the legal risk of actually possessing leaked code for comparison purposes rather than you having to do it directly. But this is another expense which an open source project might not be able to sustain.
Even for Windows – the vast majority of the leaked Windows code is >20 years old now – so if you are implementing some brand new API, odds of accidentally reusing leaked Windows code is significantly reduced.
Other options: decompile the binary, and compare the decompiled source to the AI-generated source. Or compile the AI-generated source and compare it to the Windows binary (this works best if you can use the exact same compiler, version and options as Microsoft did, or as close to the same as is manageable.)
yknow what would be funny, if a project like ReactOS or WINE relied on the Copilot Copyright Commitment[0] for protection against microsoft lawyers
[0]https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot...
(though they definitely should not lol)