Comment by pseudosavant

4 days ago

It is actually worse than that. It is at least 30 days. There is an "almost" that is doing a ton of heavy lifting here "deletion after 30 days in almost all cases". My read of that is they can hang onto data for as long as they want, even if they usually won't. And "all traffic" with an agentic harness is basically your entire codebase you work on.

> We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.

103 comments

pseudosavant

kitchi 4 days ago

They seemed to have changed the wording since you posted the comment, now specifying exactly 30 days with seemingly no exceptions.

These terms seem to be updated at-will, so I'll take that with a grain of salt however.

cornholio 4 days ago
I'm not sure they can actually respect that 30 days absolute commitment. Let's say some internal tool flags a suspect conversation, it bubbles up and a human operator reads it and it looks like evidence of a crime. Then, that employee is legally bound in many jurisdictions to prevent the destruction of that piece of evidence.
It's one thing to commit to a "everything is deleted when you press delete" automatic policy. It's quite another to say "we'll keep some stuff for up to 30 days, look inside it for any malfeasance, then pinky promise we'll delete it".
- brookst 4 days ago
  
  It generally goes without saying that legal obligations must be met. Before this 30 day policy they already had to comply with subpoenas and government retention requests.
  Same with CSAM policies for any cloud provider. Doesn’t matter what the retention policy says, if the law says otherwise, the law wins. And there is no obligation to spell out every law in every country that might change how data is handled.
- yencabulator 3 days ago
  
  It's probably been updated several times (why does it even matter what it says now if they can update the terms at will), but now it says:
  > After 30 days, the data is deleted automatically, except in the rare cases where it's part of a safety investigation or we're legally required to keep it.
- agroot12 4 days ago
  
  They write "We will require 30-day retention for all traffic on Mythos-class model". For potentially criminal content, maybe it's not "we", but "the authorities" that require the retention?
  ... and now I wonder if "we require retention" leaves the door open to retention that is not required, but let's say convenient.
mkl 4 days ago

From https://support.claude.com/en/articles/15425695-covered-mode..., emphasis mine:
> Prompts and model completions are retained for at least 30 days and then automatically deleted, unless they are subject to a safety investigation or we are legally required to maintain them.
They keep it as long as they want.
ryanisnan 4 days ago
That's strange. Even in my hobby-toy app, I have a TOS that I bump whenever the terms meaningfully change, and in my app, it forces a re-acceptance of the new terms before using the app again.
- abustamam 4 days ago
  
  You mean your terms don't just say "these terms may change at any time and your continued use of this site implies acceptance??"
  /s
  
  1 reply →
JaszHere 4 days ago

That's only in the summary, farther down it says
> After 30 days, the data is deleted automatically, except in the rare cases where it's part of a safety investigation or we're legally required to keep it.
mikestorrent 4 days ago
Yep. They changed the terms, which needs legal review in my org, but the Fable model was available immediately, so of COURSE people have to go and flock to it to see how much better it is. Amazing how easy it is to spend five figures on demand and have very little to show for it; meanwhile when I want to buy a piece of enterprise software for 40-50k/year I have to spend weeks or months building the case, providing justification for ROI etc.
- SilverElfin 4 days ago
  
  Do you know where I can find it before and after of the terms? To me it looks like the same as it was.
SilverElfin 4 days ago

Where are you seeing that updated version?
Hamuko 4 days ago

[dead]

eth0up 4 days ago

I cannot help wondering if the 'we won't train on your data' applies across the fence over there in pentagon land, where the classified contracts be. Yeah, of course they are not connected. Or..

Present user-llm activity is a goldmine of intel the agencies literally spent lives and billions on getting hardly close to, yet they elect to just let this one slip by..

Maybe. Really, I don't dispute it.

But why? It's what, or precisely what, they always dreamed of.

daveshistory 4 days ago

I don't know why you'd read literally the last 25 years of leaks from mass surveillance programs and think for one moment that they've just, gosh, overlooked the opportunities.
arcanemachiner 4 days ago

We've already gone through ECHELON, USAPATRIOT, TIA, PRISM, etc.. Either learn from the pattern and and plan accordingly, or be one of the credulous rubes caught off guard in the next wave of leaks.
rapnie 4 days ago
> We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases
This reads to me as they can use any model that is not a "Claude model", and as for human access to that other model there can be different less restrictive privacy protections. In other words, that anything goes.
- eth0up 4 days ago
  
  Yes. Words don't mean much these days. Taking corporate doublespeak at face value seems very couragious to me.

tcp_handshaker 4 days ago

Half of my customers will drop them right away, and the other half, after I explain to them what this means.

usef- 4 days ago
It's only for this model, not the one you're already using. And they're not training on the data. It's supposedly to detect abuse etc (such as someone retrying repeatedly with different variations to get around their protections)
- HWR_14 4 days ago
  
  > they're not training on the data
  How would you know that? You can only know what they say they will do with the data.
  
  13 replies →
- megous 4 days ago
  
  Why would not they train on the data if the goal is to prepare a better supervisor mechanism I guess?
- gmerc 4 days ago
  
  Yet
  
  5 replies →
- CorpOverreach 4 days ago
  
  Still unacceptable.
vntok 4 days ago
You must have very unrepresentative customers. What will they use?
- OtomotO 4 days ago
  
  No AI at all, like 5/6 of my customers
coldtea 4 days ago

And 99% of their other customers wont care either way.

bagels 4 days ago

How were they not already auditing access to customer data?

codebje 4 days ago

They were not keeping it beyond the timeframe necessary for the model to process it, so there wasn't access there to audit.

nullbio 4 days ago

"Even if they usually won't" is generous. I think they usually will, that's the point.

SilverElfin 4 days ago

It’s even worse than that. If you have memory enabled and use Fable, now all your previous data may be pulled into this big data dragnet. How can Anthropic possibly think this is okay?

abustamam 4 days ago
Because they think people are okay with it, or at the very least, don't care, or don't care to know.
Which, judging by how much people are using Fable, appears to be true.
- ithkuil 4 days ago
  
  An interesting way to rate limit access while also getting some data to analyze. They will lift this restriction later when they have more capacity
Forgeties79 4 days ago
Remember when people were trying to pretend anthropic “were the good guys”?
- calgoo 4 days ago
  
  They where never the good guys, they explicitly stated that they where fine with Claude being used to murder and spy on everyone in the world except the USA.
  
  1 reply →
daveshistory 4 days ago

Well, it's okay for them.
coldtea 4 days ago
>How can Anthropic possibly think this is okay?
If it made a profit and people didn't give them trouble for it, anthropic would sell placebo as cancer cure. What they think "is okay" is what they can get away with.
- blitzar 4 days ago
  
  On a personal level, everything Anthropic has done has resulted in a dump truck of money being emptied onto the driveways of its employees. Pavlovian conditioning is incredibly strong when reinforced with generational wealth.

bmitc 4 days ago

Does anyone know about the jailbreaks and attacks they are referring to? These are done through model queries?

deminature 4 days ago
One of the major attack vectors is distillation, where millions of questions are auto-generated and coordinated to produce training data for new LLMs. Anthropic alleges Minimax, Deepseek and Kimi were trained this way. Deepseek 4 compares favorably to Opus, so they're probably trying to prevent Deepseek 5 from being a bootleg Mythos. https://www.anthropic.com/news/detecting-and-preventing-dist...
- pseudosavant 4 days ago
  
  It takes a lot of audacity to train on all the data you can without any license, attribution, etc and then act like you can own the outputs of the model so that someone else doesn't make a model from your data without a license. I've lost a lot of respect for Anthropic in the last 24 hours.
  
  9 replies →
- anon373839 4 days ago
  
  Distillation is not an "attack", despite Anthropic themselves coining the self-serving phrase "distillation attack". And as others have noted, it is precisely identical to the sort of "attack" on published works which Anthropic themselves used to train their models.
  
  1 reply →
- SyneRyder 4 days ago
  
  > Anthropic alleges Minimax... were trained this way
  I've had some sessions this week with MiniMax M3 where it insisted it was Claude, even though there was no mention of Claude in any system prompts or context I gave to it, and it was running in my own API harness (not Claude Code).
  Though I also wouldn't be surprised if "I am claude" is just the new "I am Mozilla/5.0 AppleWebKit KHTML Like-Gecko Chrome Safari".
  
  1 reply →
MichaelZuo 4 days ago
Why would you trust anything they say at face value?
When they literally just showed you they are being deceptive by sneaking in the weasel word “almost”?
- alexjurkiewicz 4 days ago
  
  Firstly, none of this post is the contract people are signing. So it's merely a summary.
  Secondly, like all contracts I'm sure there will be exceptions for holding data longer than 30 days with reasonable cause, eg a legal hold.
  
  1 reply →
- bmitc 4 days ago
  
  I'm asking for information to understand. What about that says I trust what they say as face value?

cakeface 4 days ago

The “all human access” is doing work also. Most access will likely be from AI agents.

thefounder 4 days ago

Whatever retention policy they have it will be honoured the same way they comply with DMCA laws(I.e if we’ve got it it’s ours to train/use)

mannanj 4 days ago

however dont all these AI companies retain your non-training data indefinitely? Did I miss something where they suddenly gave you the option to opt-out of retaining your non-training data? I thought that was a big money grab of theirs.

indoordin0saur 4 days ago

After the AI companies just blatanty lying that they weren't hoovering up people's IP and art for training I assume they collect any and all data they can get their hands on for training. When it comes to the big AI players feeding their future models I 100% just assume that they suck up any data we send them. Am I cynical?

sebazzz 3 days ago

> When it comes to the big AI players feeding their future models I 100% just assume that they suck up any data we send them. Am I cynical?
There is a reason enterprise contracts and plans exist. And I think even on that account we're going to find out at some point that LLMs are training on that extremely useful data.
devld 3 days ago

I think it's very likely. This is the reason why I stay on GitHub Copilot business for the time being as a solo developer. I assume that Microsoft has less incentive than Antrophic to break the business agreement and use data for training or re-sell it to Antrophic. If I was using the heavily discounted subscription plan from Antrophic, I would 100% assume everything is fed to the machine. I'd rather pay whatever the API costs, than give it an exact recipe to build my product.
mannanj 4 days ago

and you can't opt out of data retention for non-training purposes. so I think theres a bit of a psyop occurring here.

daveshistory 4 days ago

After 30 days and before the heat-death of the universe?

mastermage 4 days ago
I mean deleting the Universe also deletes the Data so that counts.
- daveshistory 4 days ago
  
  That's a fair point.

reinitctxoffset 4 days ago

[dead]

Rekindle8090 4 days ago

[dead]

bethekidyouwant 4 days ago

Even worse when you git push something Microsoft gets all your code!

dannyw 4 days ago
Yes, that is your intended purpose of “git push”, it’s to save. And only if you use GitHub.
A better analogy here is probably “every time you use VS Code, the files you edit get sent to Microsoft”.
Some legitimate concerns:
• You have trade secrets. Previously; you can use services like Bedrock, etc, with signed contracts and significant reputations. Your contract is between AWS and you, and stays within your AWS security boundary.
• Security breaches. Remember when Anthropic accidentally published the source tree of Claude code? Or Meta’s recent AI recovery bot that didn’t check if the supplied recovery email was actually the email of the Instagram account? The best way to reduce your exposure is to minimise storage.
• Weaponised T&S. For example what if Anthropic decided to build a classifier for “usage in unsupported regions” that’s super overbearing (as we see with Fable) and vacuums up all context/input/output if there’s Mandarin? Contractually they could now retain it forever, not just 30 days, for ‘trust and safety purposes’ and perhaps have AI scan for any new or interesting ML techniques at scale, for Anthropic’s own use? They say just can’t train Claude models on the data.
- bethekidyouwant 4 days ago
  
  All analogies are bad.
  
  3 replies →
layer8 4 days ago

Only if you push it to GitHub.
tcp_handshaker 4 days ago
That is why, for the last five years I have been checking in with them, code with some of the most atrocious quality. So far...its working....
- vntok 4 days ago
  
  Thank you for your service.
- aurelius_44 4 days ago
  
  The system works!
OtomotO 4 days ago

Uhm, no?
I have NO single project on Github.
One of my clients has their project on GitHub.
Every other client I have ever worked with or for ran and runs their own gitforge.