Comment by neya

2 months ago

I feel like there is an even more important crisis that is being masked over here:

https://github.blog/changelog/2026-03-25-updates-to-our-priv...

    New Section J — AI features, training, and your data: We’ve added a dedicated section that brings all AI-related terms together in one place. Unless you opt out, you grant GitHub and our affiliates a license to collect and use your inputs (e.g., prompts and code context) and outputs (e.g., suggestions) to develop, train, and improve AI models.

We should not be using Copilot in the first place.

OpenAI/ChatGPT/Codex, Anthropic/Claude and Google/Gemini all do this.

  • > OpenAI/ChatGPT/Codex, Anthropic/Claude and Google/Gemini all do this.

    1. Everyone doing this doesn't mean it's acceptable.

    2. Google Gemini explicitly says right under the chat box if you are a paid subscriber (Workspace):

         Your <company name> chats aren’t used to improve our models. Gemini is AI and can make mistakes.
    

    Not sure about the others.

    • I think anyone using a "Team" or enterprise plan of ChatGPT/Claude/Copilot doesn't have their data used for training, that's the same across the board.

    • My comment was not meant to excuse what they're doing, just to point out that it's the bad status quo for these services

Looks like you can disable it though:

https://github.com/settings/copilot/features

-> Privacy -> "Allow GitHub to use my data for AI model training"

  • Yeah, but it's a shitty move though - it should be by default opt-in, rather than opt-out. Imagine, you just continue coding normally consciously avoiding co-pilot only to find out that Github has been secretly training their models on your code, just because you forgot to toggle a setting off which was turned on without your knowledge, which they didn't even have the decency to email you about, but just posted on a blog no one reads.

    • I got an email about it.

      Its sort of a moot point since the whole thing is for good will anyways.

      They freely scraped licensed code and semi-private data across the internet and now they're pretending that they need to license anything.

      If a court rules they had to license data in the first place then the whole industry would actually have to start following laws.