Comment by eli

2 months ago

Claude Code only trains on data if you opt in

11 comments

eli

They've recently switched to opt-out instead. And even then, if you read the legalese they say "train frontier models". That would (probably) allow them to train a reward model or otherwise test/validate on your data / signals without breaking the agreement. There's a lot of signal in how you use something (e.g. accepted vs. rejected rate) that they can use without strictly including it in the dataset for training their LLMs.

adastra22 2 months ago

They switched to opt out, with some extra dark patterns to convert people who already opted out into opting in.

throwa356262 2 months ago
I did not know that. Could you elaborate?
- adastra22 2 months ago
  
  New users now have to opt-out of training on their data - it is enabled by default. For existing users, during the transition they updated their terms and let you know about the change in policy, giving you an option to opt-in or opt-out. Opt-in was the default selection. Just today they AGAIN updated terms, presenting a click-through form on first load that looks like a permissions check (e.g. the standard dialog to enable access to the file system that we're conditioned to click-through). It was actually a terms-of-service update with opt-in selected by default, even if you already explicitly opted out. So if you hit enter to dismiss as you're used to doing, you just switched your account over to opt-in.

CSSer 2 months ago

I used to be less cynical, but I could see them not honoring that, legal or not. The real answer, regardless of how you feel about that conversation, is that Claude Code, not any model, is the product.

eli 2 months ago
I couldn't. Aside from violating laws in various countries and opening them up to lawsuits, it would be extremely bad for their enterprise business if they were caught stealing user data.
- grumbelbart2 2 months ago
  
  Maybe. But the data is there, imagine financial troubles, someone buys in and uses the data for whatever they want. Much like 23andme. If you want something to stay a secret, you don't send it to that LLM, or you use a zero-retention contract.
  
  1 reply →
- FuckButtons 2 months ago
  
  They don’t need to use your data for an external facing product to get utility from it. Their tos explicitly states that they don’t train generative models on user data. That does not include reward models, judges or other internal tooling that otherwise allows them to improve.
- jrvarela56 2 months ago
  
  If they believe it would get them AGI they would risk it.
- tsimionescu 2 months ago
  
  You don't have to imagine, you can see it happening all the time. Even huge corps like FB have been already fined for ignoring user consent laws for data tracking, and thousands of smaller ones are obviously ignoring explicit opt in requirements in the GDPR at least.