← Back to context

Comment by rikschennink

5 hours ago

> No AI-generated media is allowed (art, images, videos, audio, etc.). Text and code are the only acceptable AI-generated content, per the other rules in this policy.

I find this distinction between media and text/code so interesting. To me it sounds like they think "text and code" are free from the controversy surrounding AI-generated media.

But judging from how AI companies grabbed all the art, images, videos, and audio they could get their hands on to train their LLMs it's naive to think that they didn't do the same with text and code.

> To me it sounds like "text and code" are free from the controversy surrounding AI-generated media.

It really isn't, don't you recall the "protests" against Microsoft starting to use repositories hosted at GitHub for training their own coding models? Lots of articles and sentiments everywhere at the time.

Seems to have died down though, probably because most developers seemingly at this point use LLMs in some capacity today. Some just use it as a search engine replacement, others to compose snippets they copy-paste and others wholesale don't type code anymore, just instructions then review it.

I'm guessing Ghostty feels like if they'd ban generated text/code, they'd block almost all potential contributors. Not sure I agree with that personally, but I'm guessing that's their perspective.

  • Right, that's what I'm thinking too (I'll update my statement a bit to make that more clear), but I constantly hear this perspective that it's all good for text and code but when it's media, then it's suddenly problematic. It's equally problematic for text and code.

  • I bet they aren't honoring the terms of the MIT license I use for my repos. It's pretty lenient and I bet they're still not compliant.

    • And to be frank, why would they? Who would stop them? Would take a massive case for them to be compelled to be stopped, and no one seems to care about attribution anymore, or licensing at all in most cases. Companies using torrents to download copyrighted material, stuff individuals gone to prison for before, and they hardly even get a slap on the wrist.

It's not that code is distinct or "less than" art. It's an authority and boundaries question.

I've written a fair amount of open source code. On anything like a per-capita basis, I'm way above median in terms of what I've contributed (without consent) to the training of these tools. I'm also specifically "in the crosshairs" in terms of work loss from automation of software development.

I don't find it hard to convince myself that I have moral authority to think about the usage of gen AI for writing code.

The same is not true for digital art.

There, the contribution-without-consent, aka theft, (I could frame it differently when I was the victim, but here I can't) is entirely from people other than me. The current and future damages won't be born by me.

  • Alright, if I understand correctly, what you're saying is they make this distinction because they operate in the "text and code" space but not in the media space.

    I've written _a lot_ of open source MIT licensed code, and I'm on the fence about that being part of the training data. I've published it as much for other people to use for learning purposes as I did for fun.

    I also build and sell closed source commercial JavaScript packages, and more than likely those have ended up in the training data as well. Obviously without consent. So this is why I feel strong about making this separation between code and media, from my perspective it all has the same problem.

    • re: MIT license, I generally tell people they have to credit and that's functionally the only requirement. Are they crediting? That's really the lowest imaginable bar, they're not asked to do ANYTHING else.