Comment by crimsonnoodle58

3 days ago

So maybe one day we'll see coding agents like Claude Code create and update an ATTRIBUTION.md, citing all the open source projects and their licenses used to generate code in your project?

You got it exactly right :) And you can update the attribution.md to have it NOT rely on opensource projects that have been compromised. Imagine asking claude code to write a package/function in the style of a codebase that you care about or force it to ALWAYS rely on some internal packages that you care about. The possibilities are endless when you insert such knobs into models.

  • I would rather see that it does not rely on open source projects that have not given permission to be used to train that particular AI on.

    • Doesn’t the nature of most open source licenses allow for AI training though?

      Example — MIT:

      > Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions

      1 reply →

This is actually being built right now. ATTRIBUTION.md (https://attribution.md) is a protocol that does exactly this. You drop a file in your repo root with a few lines of YAML, and it asks AI coding agents to prompt users to star the repos they built on.

The key design choice is that it does not automate anything. The agent surfaces a prompt, the user decides yes or no. No bulk starring, no forced actions. The spec also deliberately stays out of licensing territory. It is purely a social recognition layer.

It is at v0.1 and no agent supports it yet, but the spec and schema are published and open for feedback: https://github.com/attributionmd/attribution.md

Only if there's a commercial incentive to do so methinks. Just one of the things where I expect a legal catch-up is needed to get companies to do the right thing.

Not as long as all developers add an ATTRIBUTION.md citing all open source projects they read the source for, all companies they worked for and trained them and all Stack Overflow answers they have used for write the code.

  • > Not as long as all developers add an ATTRIBUTION.md citing all open source projects they read the source for, all companies they worked for and trained them and all Stack Overflow answers they have used for write the code.

    Oh? You are under the impression that software gets the same rights and privileges of humans?

    Or maybe you are under the impression that you are so special that you face no danger from having no income because the models already ingested all your work and can launder it effectively?

  • Not everything has to be symmetrical. I’m sure there is a name for that logical fallacy.

    • I don't consider it a logical fallacy so much as a philosophical debate on art vs theft that exists in both human and AI worlds.

      IMO Nothing and nobody starts out original. We need copying to learn, to build a foundation of knowledge and understanding. Everything is a copy of something else (or put another way, art is more like a sum of your influences). The only difference is how much is actually copied, and how obvious it is.

      And in the US at least, from a legal perspective, this "how obvious is it" subjective test is often one way that copyright disputes are settled.

      For example there have been many cases of similar sounding songs that either did in fact draw an influence from an existing track (whether consciously or not), or were more likely just coincidental... but courts have ruled both ways in such cases, even if they sound extremely similar.