← Back to context

Comment by pseudosavant

4 days ago

It is actually worse than that. It is at least 30 days. There is an "almost" that is doing a ton of heavy lifting here "deletion after 30 days in almost all cases". My read of that is they can hang onto data for as long as they want, even if they usually won't. And "all traffic" with an agentic harness is basically your entire codebase you work on.

> We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.

They seemed to have changed the wording since you posted the comment, now specifying exactly 30 days with seemingly no exceptions.

These terms seem to be updated at-will, so I'll take that with a grain of salt however.

  • I'm not sure they can actually respect that 30 days absolute commitment. Let's say some internal tool flags a suspect conversation, it bubbles up and a human operator reads it and it looks like evidence of a crime. Then, that employee is legally bound in many jurisdictions to prevent the destruction of that piece of evidence.

    It's one thing to commit to a "everything is deleted when you press delete" automatic policy. It's quite another to say "we'll keep some stuff for up to 30 days, look inside it for any malfeasance, then pinky promise we'll delete it".

    • It generally goes without saying that legal obligations must be met. Before this 30 day policy they already had to comply with subpoenas and government retention requests.

      Same with CSAM policies for any cloud provider. Doesn’t matter what the retention policy says, if the law says otherwise, the law wins. And there is no obligation to spell out every law in every country that might change how data is handled.

    • It's probably been updated several times (why does it even matter what it says now if they can update the terms at will), but now it says:

      > After 30 days, the data is deleted automatically, except in the rare cases where it's part of a safety investigation or we're legally required to keep it.

    • They write "We will require 30-day retention for all traffic on Mythos-class model". For potentially criminal content, maybe it's not "we", but "the authorities" that require the retention?

      ... and now I wonder if "we require retention" leaves the door open to retention that is not required, but let's say convenient.

  • That's strange. Even in my hobby-toy app, I have a TOS that I bump whenever the terms meaningfully change, and in my app, it forces a re-acceptance of the new terms before using the app again.

  • That's only in the summary, farther down it says

    > After 30 days, the data is deleted automatically, except in the rare cases where it's part of a safety investigation or we're legally required to keep it.

  • Yep. They changed the terms, which needs legal review in my org, but the Fable model was available immediately, so of COURSE people have to go and flock to it to see how much better it is. Amazing how easy it is to spend five figures on demand and have very little to show for it; meanwhile when I want to buy a piece of enterprise software for 40-50k/year I have to spend weeks or months building the case, providing justification for ROI etc.

    • Do you know where I can find it before and after of the terms? To me it looks like the same as it was.

I cannot help wondering if the 'we won't train on your data' applies across the fence over there in pentagon land, where the classified contracts be. Yeah, of course they are not connected. Or..

Present user-llm activity is a goldmine of intel the agencies literally spent lives and billions on getting hardly close to, yet they elect to just let this one slip by..

Maybe. Really, I don't dispute it.

But why? It's what, or precisely what, they always dreamed of.

  • I don't know why you'd read literally the last 25 years of leaks from mass surveillance programs and think for one moment that they've just, gosh, overlooked the opportunities.

  • We've already gone through ECHELON, USAPATRIOT, TIA, PRISM, etc.. Either learn from the pattern and and plan accordingly, or be one of the credulous rubes caught off guard in the next wave of leaks.

  • > We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases

    This reads to me as they can use any model that is not a "Claude model", and as for human access to that other model there can be different less restrictive privacy protections. In other words, that anything goes.

    • Yes. Words don't mean much these days. Taking corporate doublespeak at face value seems very couragious to me.

Half of my customers will drop them right away, and the other half, after I explain to them what this means.

How were they not already auditing access to customer data?

  • They were not keeping it beyond the timeframe necessary for the model to process it, so there wasn't access there to audit.

"Even if they usually won't" is generous. I think they usually will, that's the point.

It’s even worse than that. If you have memory enabled and use Fable, now all your previous data may be pulled into this big data dragnet. How can Anthropic possibly think this is okay?

  • Because they think people are okay with it, or at the very least, don't care, or don't care to know.

    Which, judging by how much people are using Fable, appears to be true.

    • An interesting way to rate limit access while also getting some data to analyze. They will lift this restriction later when they have more capacity

  • Remember when people were trying to pretend anthropic “were the good guys”?

    • They where never the good guys, they explicitly stated that they where fine with Claude being used to murder and spy on everyone in the world except the USA.

      1 reply →

  • >How can Anthropic possibly think this is okay?

    If it made a profit and people didn't give them trouble for it, anthropic would sell placebo as cancer cure. What they think "is okay" is what they can get away with.

    • On a personal level, everything Anthropic has done has resulted in a dump truck of money being emptied onto the driveways of its employees. Pavlovian conditioning is incredibly strong when reinforced with generational wealth.

Does anyone know about the jailbreaks and attacks they are referring to? These are done through model queries?

  • One of the major attack vectors is distillation, where millions of questions are auto-generated and coordinated to produce training data for new LLMs. Anthropic alleges Minimax, Deepseek and Kimi were trained this way. Deepseek 4 compares favorably to Opus, so they're probably trying to prevent Deepseek 5 from being a bootleg Mythos. https://www.anthropic.com/news/detecting-and-preventing-dist...

    • It takes a lot of audacity to train on all the data you can without any license, attribution, etc and then act like you can own the outputs of the model so that someone else doesn't make a model from your data without a license. I've lost a lot of respect for Anthropic in the last 24 hours.

      9 replies →

    • Distillation is not an "attack", despite Anthropic themselves coining the self-serving phrase "distillation attack". And as others have noted, it is precisely identical to the sort of "attack" on published works which Anthropic themselves used to train their models.

      1 reply →

    • > Anthropic alleges Minimax... were trained this way

      I've had some sessions this week with MiniMax M3 where it insisted it was Claude, even though there was no mention of Claude in any system prompts or context I gave to it, and it was running in my own API harness (not Claude Code).

      Though I also wouldn't be surprised if "I am claude" is just the new "I am Mozilla/5.0 AppleWebKit KHTML Like-Gecko Chrome Safari".

      1 reply →

  • Why would you trust anything they say at face value?

    When they literally just showed you they are being deceptive by sneaking in the weasel word “almost”?

    • Firstly, none of this post is the contract people are signing. So it's merely a summary.

      Secondly, like all contracts I'm sure there will be exceptions for holding data longer than 30 days with reasonable cause, eg a legal hold.

      1 reply →

    • I'm asking for information to understand. What about that says I trust what they say as face value?

The “all human access” is doing work also. Most access will likely be from AI agents.

Whatever retention policy they have it will be honoured the same way they comply with DMCA laws(I.e if we’ve got it it’s ours to train/use)

however dont all these AI companies retain your non-training data indefinitely? Did I miss something where they suddenly gave you the option to opt-out of retaining your non-training data? I thought that was a big money grab of theirs.

After the AI companies just blatanty lying that they weren't hoovering up people's IP and art for training I assume they collect any and all data they can get their hands on for training. When it comes to the big AI players feeding their future models I 100% just assume that they suck up any data we send them. Am I cynical?

  • > When it comes to the big AI players feeding their future models I 100% just assume that they suck up any data we send them. Am I cynical?

    There is a reason enterprise contracts and plans exist. And I think even on that account we're going to find out at some point that LLMs are training on that extremely useful data.

  • I think it's very likely. This is the reason why I stay on GitHub Copilot business for the time being as a solo developer. I assume that Microsoft has less incentive than Antrophic to break the business agreement and use data for training or re-sell it to Antrophic. If I was using the heavily discounted subscription plan from Antrophic, I would 100% assume everything is fed to the machine. I'd rather pay whatever the API costs, than give it an exact recipe to build my product.

  • and you can't opt out of data retention for non-training purposes. so I think theres a bit of a psyop occurring here.

Even worse when you git push something Microsoft gets all your code!

  • Yes, that is your intended purpose of “git push”, it’s to save. And only if you use GitHub.

    A better analogy here is probably “every time you use VS Code, the files you edit get sent to Microsoft”.

    Some legitimate concerns:

    • You have trade secrets. Previously; you can use services like Bedrock, etc, with signed contracts and significant reputations. Your contract is between AWS and you, and stays within your AWS security boundary.

    • Security breaches. Remember when Anthropic accidentally published the source tree of Claude code? Or Meta’s recent AI recovery bot that didn’t check if the supplied recovery email was actually the email of the Instagram account? The best way to reduce your exposure is to minimise storage.

    • Weaponised T&S. For example what if Anthropic decided to build a classifier for “usage in unsupported regions” that’s super overbearing (as we see with Fable) and vacuums up all context/input/output if there’s Mandarin? Contractually they could now retain it forever, not just 30 days, for ‘trust and safety purposes’ and perhaps have AI scan for any new or interesting ML techniques at scale, for Anthropic’s own use? They say just can’t train Claude models on the data.

  • Uhm, no?

    I have NO single project on Github.

    One of my clients has their project on GitHub.

    Every other client I have ever worked with or for ran and runs their own gitforge.