Comment by bmitc

4 days ago

Does anyone know about the jailbreaks and attacks they are referring to? These are done through model queries?

One of the major attack vectors is distillation, where millions of questions are auto-generated and coordinated to produce training data for new LLMs. Anthropic alleges Minimax, Deepseek and Kimi were trained this way. Deepseek 4 compares favorably to Opus, so they're probably trying to prevent Deepseek 5 from being a bootleg Mythos. https://www.anthropic.com/news/detecting-and-preventing-dist...

  • It takes a lot of audacity to train on all the data you can without any license, attribution, etc and then act like you can own the outputs of the model so that someone else doesn't make a model from your data without a license. I've lost a lot of respect for Anthropic in the last 24 hours.

    • Everyone knows it's bullshit but because these companies are being valued at a trillion dollars a piece, it's hard to say that if you were in their shoes you'd do any differently.

      8 replies →

  • Distillation is not an "attack", despite Anthropic themselves coining the self-serving phrase "distillation attack". And as others have noted, it is precisely identical to the sort of "attack" on published works which Anthropic themselves used to train their models.

  • > Anthropic alleges Minimax... were trained this way

    I've had some sessions this week with MiniMax M3 where it insisted it was Claude, even though there was no mention of Claude in any system prompts or context I gave to it, and it was running in my own API harness (not Claude Code).

    Though I also wouldn't be surprised if "I am claude" is just the new "I am Mozilla/5.0 AppleWebKit KHTML Like-Gecko Chrome Safari".

Why would you trust anything they say at face value?

When they literally just showed you they are being deceptive by sneaking in the weasel word “almost”?

  • Firstly, none of this post is the contract people are signing. So it's merely a summary.

    Secondly, like all contracts I'm sure there will be exceptions for holding data longer than 30 days with reasonable cause, eg a legal hold.

    • This reply does not make sense.

      I did not claim it was the literal contract people would sign?

  • I'm asking for information to understand. What about that says I trust what they say as face value?