Comment by 0xbadcafebee

18 hours ago

There's two basic kinds of distillation: 1) the massive [and dumb] method where you ask a question and use the answer as reinforcement (Black Box), and 2) more targeted distillation where you use one model to directly inform/train/guide another model (RLAIF).

The latter is basically fine-tuning the model with direction from another model. Thousands of businesses do this every day to fine-tune. This is almost certainly what the Chinese labs are doing, since it has a much better effect on the end result than just getting simple answers to simple questions.

These complaints of distillation are inflating the problem to make it sound worse than it is, because they want the USG to block/ban Chinese model providers as protectionism. They have already called for more export controls on chips (which is funny because DeepSeek v4 was designed to run on Huawei chips and now the other Chinese providers are following suit). But they can't come right out and say that, so their claim is that they're asking for more export controls because distilled models might not be as safe as their own. But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.

177 comments

0xbadcafebee

anon373839 13 hours ago

> These complaints of distillation are inflating the problem to make it sound worse than it is

Unfortunately, the Reuters piece itself is complicit in this dramatization. The lede paragraph parrots Anthropic's talking point that distillation is an "attack", without using quotes that would alert the reader that this framing is a corporate talking point. Distillation is NOT an attack.

p4coder 13 hours ago
Agreed! I had to do a double take and check the URL. I thought I am reading a press release rather than actual reporting.
- soperj 12 hours ago
  
  That's exactly what they pay the publicist for.
- friendzis 12 hours ago
  
  https://news.ycombinator.com/item?id=13155538
  
  2 replies →
- da_grift_shift 12 hours ago
  
  Same thing nowadays :^)
  
  1 reply →
w0m 5 hours ago
> Distillation is NOT an attack.
From the article -
> 28.8 million exchanges with Claude through almost 25,000 fraudulent accounts
wouldn't that be considered an attack? Not sure what I'm missing here.
- roblabla 5 hours ago
  
  An attack against what? The sanctity of "their IP" that is itself the result of a massive copyright violation campaign?
  
  16 replies →
- VBprogrammer 5 hours ago
  
  It's merely a ToS violation.
  
  4 replies →
- dec0dedab0de 5 hours ago
  
  That's violating TOS, spamming, possibly a DDOS, but the distillation in and of itself is not an attack it's just using the model.
  Like the difference between scraping a site with one or two active connections vs thousands. It's not the scraping that is an attack, it is how they are going about it
  
  1 reply →
- HarHarVeryFunny 4 hours ago
  
  Just sending a request to a service does not constitute an "attack". It seems that what Anthropic mean by "fraudulent account" is probably just one violating their terms of service - misuse of a subscription account, and/or the presumed nature of what the user was trying to do.
  I guess Anthropoic would regard any developer using their subscription plan with OpenCode to be operating a "fraudulent account", maybe an "attacker" too. Now we know how they think of anyone using Claude to develop software competing with Anthropic. Only an "attacker" would want to vibe code their own harness, or god forbid want to learn how to build/train an LLM.
  Of course Anthropic's wording is intended to be deliberately provocative, since they are trying to manipulate the US government into shutting down the Chinese competition.
- dundarious 5 hours ago
  
  Attack or customer
- svachalek 5 hours ago
  
  Is an attempt to copy all or parts of a model an attack, when models have very questionable copyright status? Maybe? I don't think most people have much sympathy here though.
- stingraycharles 5 hours ago
  
  Let’s not forget that by the same logic, Anthropic et al are “attacking” copyright holders all around the world by scraping their data unauthorized for training.
  Pot calling kettle black.
  
  3 replies →
gojomo 4 hours ago
Distillation done via bulk automated activity of fraudulent accounts, in violation of a terms-of-service, can reasonably be called a "an attack" – specifically a "distillation attack" – even though distillation itself isn't necessarily an "attack".
This is similar to how compromising an account through bulk automated trials of many passwords is reasonably called "an attack" – specifically a "dictionary attack" – even though using a dictionary is not itself an "attack".
You shouldn't need to smuggle your sympathies (for the tactic or perpetrators) or antipathies (for the target) into peculiar judgy language prescriptivism against common, understood usages.… that then label Reuters "complicit" for simply reporting Anthropic's claims accurately. That's what Reuters is supposed to do, in a story about a letter Anthropic wrote!
- ndriscoll 24 minutes ago
  
  Labeling it as an attack is smuggling sympathies. It is not common; there are only a small number of people who even discuss the concept. A company buying a product with the intent to reverse engineer or copy its features is likewise not an attack; it's just normal competition that benefits society.
crispyambulance 7 hours ago
The standard of neutrality that people here pretend to require from news organizations is not even remotely realistic.
It was a timely story from Reuters. They do fast news feeds, like APnews. Could it have been better or more accurate? Sure, they could have gone into why distillation may or may not be seen as "an attack". But then it would have been a more involved story, defeating the purpose of a news feed.
The Reuters piece was "good enough". Some other place like the NYTimes or WSJ can follow up with more detailed investigative coverage if it's a worthwhile story.
- crmd 6 hours ago
  
  I don’t want or need fast and “good enough” news and i’m gonna try and make a case that you don’t either.
  Until very recently, all of modern civilization was built by people who got their news at most once a day. Reputable bureaus like Reuters took that day to get it right.
  I’m not the national security advisor, so I don’t need a push notification that there was an earthquake in Nepal, or a bullshit rush-job briefing on Chinese AI distillation tactics.
  
  1 reply →
- ipaddr 6 hours ago
  
  Good enough slop to serve the masses. Doesn't need to be truthful because its fast? Why even both to write anything?
  
  2 replies →
fny 6 hours ago
Distillation may not be an attack, but it is a ToS violation and could be seen as IP theft.
Any reasonable company would be pissed if a competitor, especially at Ali Baba's size, leveraged that company's R&D to compete. It is in this sense, a corporate attack.
If you want to roll your eyes at distillation concerns, you might need to excuse Anthropic for originally using pirated material to train their models.
- ndriscoll 1 minute ago
  
  What IP? It seems pretty obvious to me that it's not:
  * trademarks (not using the mark) * patents (what patent?) * copyright (the code and models are all different, and machine outputs lack creativity and are not copyrightable) * trade secrets (any member of the public has the same access to input/outputs)
  So what is "IP" here?
- bad_haircut72 6 hours ago
  
  More the opposite - companies who stole IP for their own benefit have no leg to stand on when others do it back. Personally I couldnt care less if Chinese labs rip off Anthropic. Its what America would do if they wanted to, for whatever reason (they probably do it right back secretly anyway).
dist-epoch 10 hours ago
Reuters is probably the most rigorous news agency in the world.
> it said was the largest known attack
> Anthropic said in the letter it was supportive of the U.S. government's efforts to combat the attacks
both times the word "attack" appears it's clearly stated that the word was used by the company, it's a direct company quote.
actually putting it into quotes would be editorializing
> Unfortunately, the Reuters piece itself is complicit in this dramatization
how would you feel if somebody quoting you would turn your word dramatization into "dramatization" because they don't agree with your assesment
- Laurel1234 8 hours ago
  
  > how would you feel if somebody quoting you would turn your word dramatization into "dramatization" because they don't agree with your assesment
  This is exactly what news agency should be doing though. When the dude showed up to Comet Pizza to look for Hillary Clinton or whatever, do you figure they should've printed "Local hero saves children from predatory cabal"?
  
  1 reply →
- psychoslave 10 hours ago
  
  Well, let’s say you put the picture of some political figure, and put in highly contrasted red, bold large catchy font, "TERRORIST THAT KILLED MILLION PEOPLE", then below that in barely visible contrast, in tiny discrete letters, "is what this person probably will claim to be against".
  This whole sentence technically will be correct, 100% guarantee, whatever this person actually even said or think.
  From a propaganda point of view, framing the elements of language is even more important than what the statements actually states to be true or possibly true.
  
  5 replies →
echelon 8 hours ago
Anthropic raped everyone without asking and stole their labor to build their career-commoditizing tech.
Distillation is Robin Hooding it back so that one trillion dollar company doesn't reap all the benefits of their automation of the workforce.
Distillation is Prometheus bringing fire from the gods to give to ordinary humans. Something we all own anyway, but that was kept from us.
Distillation is freedom.
Everyone should be pro-distillation. We should all work together to distill every proprietary model.
Anthropic stole. OpenAI stole. Google stole. ElevenLabs stole. Suno stole.
We should be able to get it all back.
- SillyUsername 8 hours ago
  
  And a number of Qwen variants are available to self host. Do Anthropic have any like that?
  
  1 reply →
- mrngld 8 hours ago
  
  Eaaaaasy now, the Chinese labs aren't freedom fighters on behalf the common man. They're not non-profits, they're not neutral transnational organizations only dedicated to open source efforts.
  They're Chinese companies offering open source models now as loss leaders to keep themselves in the game because they know virtually nobody, especially in the corporate world, would contract with them and give them access to their data. They might as well just send a Dropbox link of all their sensitive data directly to their Chinese competitors, same end effect.
  They're also doing it as the digital equivalent of what they've done in other industrial sectors for decades. Undercut and flood the market and once you've killed or severely hindered your competition, then you have the market cornered. The moment they can afford to these open source releases will stop.
  Then the world will be stuck, just the way the world is largely stuck on rare earths. Instead of being able to regulate the leading companies from DC and Brussels, they'll be stuck watching Beijing call the shots.
  That world would likely always have guys like Mistral and Trinity, but it's an open question if they'll ever catch up to the frontier.
  And then Beijing will enjoy access to the data (ask any multinational operating in China for more than 2 seconds how useful contracts and Chinas legal system is for protecting IP), and these companies will roll in the money, and the Chinese supply chain will grow up behind the labs.
  So, let's not pretend they've got the moral high ground. No. That boot just isn't on your neck yet. They're playing the long game -- and they're good at it.
  
  23 replies →

ALLTaken 6 hours ago

They want to create a monopoly and destroy every competitor, before they got a chance to rival them.

Why can't OSS software rival closed source software? It should be an open market, at least "somewhat", what's happening for real? EU providers will also get banned, if they reach or exceed US model capabilties?

Closed source providers can close your account at a whim like and destroy your business and then use the data you supplied them to create a competitor (Meta, Google, OpenAI, Anthrophic).

deaton 6 hours ago

Well Zai's GLM 5.2 legitimately is a frontier-level model, though not quite parallel with Opus or Fable. Unfortunately, its too damn big to run locally for most people. Thats the bottleneck right now; the open-weight models exist but something capable of competing with the frontier models just can't run on anything normal yet.
buellerbueller 6 hours ago
>They want to create a monopoly and destroy every competitor, before they got a chance to rival them.
VC/Startup playbook 101.
- zobzu 5 hours ago
  
  also why cant i have my own airport, too big to fit in my backyard... you guys lol.

gmerc 14 hours ago

https://research.nvidia.com/labs/lpr/slm-agents/ - Distillation data is a natural byproduct of using these models. There's no effective defence against it. Anthropic is degrading thinking blocks to summaries to slow it down and hide model internals, but in the end, the math says you're SOL and it works on MNC/Large Corporate scale well enough that the moment cost becomes a priority, you're left without the lock in you need to keep customers paying.

alfiedotwtf 11 hours ago
Byproduct? It’s essentially the only part of an LLM that is useful, because it’s the WHOLE product!
It’s the same reason why DRM for audio and video is a non sequitur - if you want a person to see or hear audio or video, eventually at the end of the chain, it’s going to be converted to audio for the ear and light for the eyes - that’s why you attach your tap.
Without a model generating tokens, what’s the point. So if Anthropic somehow disable quality token generation, what’s the point!
- TeMPOraL 7 hours ago
  
  That's why the harness is moving server-side: because generating tokens is not the actual point of the model, not for the users. Especially with tool calling giving us agents that can act, most of the tokens generated are not, themselves, critical to the end users. Specifically, a lot of tokens goes into orchestrating actual tool calls, and then most "thinking tokens" are only relevant to users only in so far as they help users keep track of and verify what the LLM is doing. So all those tokens can be hidden or replaced by partial summaries, and all of that can happen server-side, and then there's very little to distill from.
  
  1 reply →

giancarlostoro 6 hours ago

Heck, one of my favorite fine tuned copies of Qwen uses Opus 4.6 Reasoning distilled. I'm not sure where the maintainer is based out of, but me in the states, if I had the hardware to do similar things I would. Its like you say, basically everyone is doing it. It kind of makes sense to me too given that you can have roughly similar data, but your reasoning logic is what the real secret sauce is in my eyes. It doesn't matter if you know everything in the world, if you don't know how to reason with that information.

https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-...

cm2187 12 hours ago

Stupid question: I was under the impression that these models were trained on PB of data. Surely the amount of questions/response they can extract from querying a bigger model (Claude) is fairly modest. How is it not a drop vs the training dataset?

ACCount37 10 hours ago
It's not about how big your dataset is - it's about how you use it.
I jest, but I'm also completely serious. 1T tokens from Claude can teach a model something 1T tokens scraped from the open web can't. Things like "how an LLM can problem solve effectively", or "how an LLM should use tools", or "how to construct reasoning chains", or "when to double check", or "what innate capabilities an LLM can or can't rely on".
Those are valuable things that Anthropic's own team spent a lot of effort post-training into Claude. Distillation allows them to be extracted and transferred to an otherwise unremarkable base model.
- macleginn 10 hours ago
  
  Unremarkable base model will remain an unremarkable fine-tuned model that memorised a couple thousand of input-output pairings.
  
  5 replies →
- epolanski 9 hours ago
  
  Can you back up this with hard data and evidence?
  Most research converges to the idea that RL on synthetic data makes models worse, not better.
  If what you claim was anywhere near that relevant, than we would've long achieved singularity by simply feeding increasingly better output to the training of the next model in a loop. Yet this doesn't work.
  25 million turns on Claude output is a small amount, yet an expensive one (we talking hundreds of $ millions) that is better spent on compute.
  There's no evidence such a process works, but I'd like to know more if I'm wrong.
  
  11 replies →
reasonableklout 11 hours ago
There are multiple stages of training, and the data/compute mix at each are quite different and produce different "layers" of intelligence.
The pretraining stage is the first stage which consists of "next token prediction" on the entire internet, PB of tokens, etc. This is what most people think of when they think of training LLMs, however it produces a "base model" which is not really "intelligent", but rather much like a blurry JPEG of all human language and knowledge. You cannot really talk to such a model; it will simply complete your prompt by producing both sides of the conversation. Note however at some level the training has encoded enough structure through compression that it is able to simulate all sorts of phenomena, from human conversations to code. The great R&D difficulty here is to scale pretraining so that it can proceed smoothly in vast distributed datacenters in a fault-tolerant manner.
The next few stages are collectively called post-training, and typically consist of supervised fine-tuning, then reinforcement learning.
In supervised fine-tuning, the model is further trained to predict the next token, but on a much more focused data set of natural language conversations where the "assistant" and "user" turns are explicitly delineated with special tokens. The output of this stage is a model which is capable of carrying on proper conversations, but typically with no ability to creatively problem-solve, and less of a personality. The data and compute are many orders of magnitude smaller than in pretraining.
The reinforcement learning stage used to be a small part of model training, but ever since AI-assisted coding took off, it has become larger and larger chunk of training. In recent models, the compute spend on RL has allegedly come to rival or even exceed that of pretraining [1], which is a bit scary because RL is classically what lead to sci-fi like AIs which are extremely good at accomplishing goals to the detriment of everything else.
The way that RL works is that you put an instance of your model in some environment (such as a VM containing a git repository) and give it a task (such as fix the linked github issue). The model will then generate a bunch of attempts to solve the task which we call "trajectories", in most cases there is either an objective measure of the task success (such as passing the tests), or a fuzzy measure (such as having another LLM look at the results and provide a score). This is called the reward, and the model will learn slowly by producing trajectories that receive reward. It can actually be quite hard to prevent "reward hacking" from the model here and the rewards must be shaped very carefully, much R&D labor goes into here, as well as similar challenges to distributed pretraining.
A significant challenge is that coding/knowledge work tasks these days are getting extremely difficult, we are far beyond 2024 days where models could barely solve the easiest problems in SWE-bench. Tasks at the frontier now look more like mini projects that would take humans multiple hours or even days to finish (or in some cases, research-style tasks that would be beyond reach for even top human experts, such as the Erdős unit distance problem which was posed in 1946 but wasn't solved until recently, by GPT-5.5). Huge amounts of trajectories must be produced, and huge amounts of them produce zero reward and therefore are useless for learning. Getting a cold start requires running tens of thousands of instances of your model in VMs in parallel for multiple days to produce trajectories, to say nothing of the GPU costs.
So what do you do when you only have a model which is capable of basic conversations but cannot even begin to tackle basic coding tasks, use tools, etc? The approach that companies behind the frontier have decided on is to bootstrap their learning process by having an already extremely intelligent model such as Claude produce hundreds of thousands of seed trajectories for them. Then they can use this data to get a warm start and begin learning immediately. And if you use Claude for your reward model too, you get to skip the nastiness of reward shaping.
Therefore, even if in number of raw tokens the data are much smaller than internet-scale pretraining data, the value that each token provides is far far greater.
[1] For example, Grok 4 compute spend on RL was ~100% of that of pretraining: https://www.interconnects.ai/p/grok-4-an-o3-look-alike-in-se...
- petesergeant 10 hours ago
  
  props for a great write-up
  
  3 replies →
musebox35 8 hours ago

Training isn’t a single homogeneous step. It starts with pretraining which requires bulk PB of data but you have less quality concerns here. You cover the whole data distribution. Later stages require less and less but increasingly higher quality and complex datasets. The late stage ones are highly curated and might even be sourced from world subject experts. This is where frontier labs with big pockets have the advantage.
woctordho 12 hours ago

Actually nowadays LLMs are only trained with TBs rather than PBs of data, and it's not too hard to find GBs of agent traces online.
eru 12 hours ago
This might be like an observational study vs a study with a control?
- anon373839 12 hours ago
  
  From what I understand, at this point, the main value of stronger model outputs is simply to bootstrap reasoning behavior during the RL post-training phase. It gets you past the “cold start” problem with RL, after which the outputs aren’t needed anymore. From then on, it’s hill climbing and that requires environments for the model to interact with get rewards from.

summarybot 5 hours ago

It's about training data and using Claude to compare 2 outputs and have it indicate the better one. This gives you higher quality training data that you can use to train a fresh set of weights. Weights don't get adjusted on-the-fly, instead the dataset for training is improved and then you train a'fresh. And it's hard to detect because you're just asking the model which of these outputs for a given prompt is better? Or something along those lines.

handoflixue 13 hours ago

> But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.

They claim two things:

1) The specific, available jailbreak for Fable 5 is not dangerous - this has been confirmed by multiple experts, and there is no credible evidence against this claim (in other words, Anthropic is probably correct)

2) It is impossible to build an LLM that is immune to all jailbreaks. Again, there is no credible evidence against this claim, i.e. Anthropic is again entirely correct.

If #1 was false, they could just publish the details of the jailbreak - it supposedly only works on Fable 5, so there's no possible danger.

If #2 was false, surely some other LLM lab would have done it by now. Especially since a number of governments have made it clear there is a market for such a project.

mcintyre1994 13 hours ago
Supposedly the details of the ‘jailbreak’ are that you give it insecure code and say “fix this code”, and it does, and then you ask it for test scripts and that’s effectively an exploit against the unfixed code.
If true then I have no idea how anyone’s going to release a useful model that doesn’t have the same jailbreak. https://www.theregister.com/security/2026/06/15/feds-freaked...
- handoflixue 11 hours ago
  
  If that's the extent of the jailbreak, then the government should have banned every existing LLM - their story only makes sense if there's some Fable-specific capability that got unlocked.
  
  1 reply →
Charon77 11 hours ago
> If #2 was false, surely some other LLM lab would have done it by now.
This is a logical flaw. LLM that is immune to jailbreak _could_ exist, but not yet, or maybe nobody talks about it. Yes there's a market, but all of these AI boom is too recent to make any claims.
- gf000 11 hours ago
  
  Like how would you even define what a jailbreak is?
  
  2 replies →
agos 12 hours ago
I'm pretty sure that Gödel incompleteness theorem and its consequences pretty much guarantee #2
- gwd 11 hours ago
  
  I'm guessing you mean, the incompleteness theorem guarantees that nobody can prove their model is un-break-able?
  I don't think that's quite what it means. The theorem says that it's impossible to write a function, "will_halt(program, input)", that will be correct for all possible {program, input} pairs. But for a particular program, you may be able to write a proof that it will halt for all inputs -- that's what software verification is about.
  The implications here would be that nobody can create a "will_jailbreak(model, input)" function which works for all model/input pairs. But we don't need a general function which works for all model/input pairs; we just need a way to prove that for a specific model, there will be no jailbreaks for any input. As with software verification, this may require that the model be developed in a specific way.
  Granted we don't currently know how to make such a proof regarding neural networks; but that's not because of Gödel.
- dgellow 11 hours ago
  
  Mind to elaborate?
- Zababa 9 hours ago
  
  No actually I don't think it does and I don't think they're related.
- monkey_monkey 11 hours ago
  
  Exactly. It's impossible to guarantee #2 doesn't happen (ie protect against all jailbreaks) for any system of sufficient complexity.

dannyw 16 hours ago

If you’re doing evals, you’re basically doing RLAIF without training a model; just looking at the results.

Fundamentally it is very difficult to stop this while still making your AI models useful.

zmgsabst 13 hours ago

Similarly, if you did a corpus study on bioRvix to summarize recent science findings — you could use the same questions and answers to fine tune a model.
There is no way to communicate information at scale to companies through the API, for anything approaching a real application, without that information forming a corpus another model can be trained on.
But it wouldn’t be the first time they broke a model:
Their “guardrails” that cause it to reject user prompts also means it relies on its pop science summary of medicine to tell you why bioRxiv is wrong rather than accurately summarize the papers.
They’ve successfully created a smug, argumentative average of the internet which refuses to even consider it might be wrong or that it’s reading a science paper which is based on measurements and not vibes — but why would I pay for that?
I get it for free online.

sorenjan 8 hours ago

Doesn't "real" distillation use the logits instead of the final tokens? I would classify this more like using a model to generate synthetic training data.

SubiculumCode 11 hours ago

The compute deficit of Chinese Ai companies is real, and it IS THE ONLY competitive advantage that Western companies have.

The only way the U.S. keeps that edge is to prevent distillation. The only way Chinese companies can make up for the deficit in compute is to distill. There innovation in great supply on every side of the Ocean. Its about the chips. And in terms of national security, for the U.S., and for China, its about the chips and the distillation that undermines that advantage. This is an arms race.

HarHarVeryFunny 6 hours ago

If compute or access to training data were the only issues, then companies like Meta and X.ai (Grok) should be doing better, even Google for that matter. Musk even admitted that Grok used training data from OpenAI models.
https://techcrunch.com/2026/04/30/elon-musk-testifies-that-x...
While there is no moat as such, there is still a lot of expertise that goes into training SOTA models. There's a reason Google was willing to pay $2.7B just to get Noam Shazeer back to improve Gemini.
gmerc 8 hours ago

You got that wrong. The forcing function of compute scarcity is an advantage not a detriment. The amount of investment pulverized in performative model training and dead ends (Hi Sora) should make this obvious.
pennomi 9 hours ago
If saying “plz don’t distill me” is your moat, you don’t have a moat.
- SubiculumCode 9 hours ago
  
  No. What will happen is it will turn dark. No public release. National Security uses only, or in carefully vetted industry settings.
  
  4 replies →
davedx 9 hours ago
Define compute deficit?
They've been bringing out open weight models competitive with frontier models. How could they do that if they had a compute deficit?
- rescbr 5 hours ago
  
  If they need to divert inference resources to train models, this counts as a compute deficit to me.
  I'm using GLM-5.2 daily for my own stuff, and during Chinese business hours, specially on their afternoon, it's a festival of rate limits.
- RugnirViking 9 hours ago
  
  I believe this article is about the technique they may or may not have used.
PunchyHamster 11 hours ago
> The only way the U.S. keeps that edge is to prevent distillation.
For how long ? year ? how long till model that is year behind will be fine for 90%+ use cases ?
- dofm 4 hours ago
  
  Putting aside agentic coding, that is to say, if you judge LLMs as a consumer technology (an old-fashioned idea for the inward-looking tech industry admittedly), then open weights LLMs, even quite small ones like Gemma 4, can likely already satisfy 90% of applications with a bit of help from search and browse tools.
  Much of the arms race for better LLMs exists to satisfy only the IT industry's needs.

janalsncm 14 hours ago

Yeah I think the technical term is something more like “pseudo-labeling”. The OG distillation requires logits which Anthropic doesn’t provide.

lemax 13 hours ago

I've used RLAIF to build out heuristic based non-LLM models for various decision systems and achieved like, 95% F1 on certain projects. We're in a place where models can be used to fine tune a lot of stuff via loops.

friendzis 12 hours ago

> These complaints of distillation are inflating the problem to make it sound worse than it is

This is, in part, a problem every judicial and legislative system has faced since forever: form versus function.

Take a classic elicitation spying techniques: a foreign spy meets a military officer/scientist at a bar, strikes up a conversation, makes an observation wondering how could a missile hit some target at some accuracy and elicits a response that with laser guidance it is entirely possible. From this they get info that there is some technology to laser guide missiles. Or in retail, a competitor hiring a secret buyer for core baskets of goods and analyzing prices in the receipts.

The function is espionage, the form is conversation and all info is in a sense provided willingly. Where do you pull the slider?

These distillation "attacks" are not only indistinguishable from evals, they ARE evals. The function is own model training, the form is eval. Normally, one would expect to have risk benefit analysis based discussion which direction to push the legality slider to. The problem with these recurring statements is that they invoke enshitification of legislature.

crazylogger 9 hours ago

Chinese labs access Claude via API. Isn't it the black box method by definition?

killerstorm 6 hours ago

I'm sorry, but you got the terminology exactly backwards. Training on the answer is called supervised fine-tuning.

Just for the sake of clarity:

0. Full distillation uses logits of the teacher model - that's much more information than the text itself. This is a kind of distillation used inside labs, but one can't distill Claude this way as logits are not available via API.

1. Supervised fine-tuning on synthetic data might be called blackbox distillation. I guess that's what you meant in your case (1).

2. Reinforcement learning (like RLAIF) uses least amount of information from the teacher, i.e. only few bits per task.

mannanj 14 hours ago

>But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.

Yes this is in line with what Anthropic said in their public statements about their Fable access restriction by the government directive. The hypocrisy and inconsistency in their statements and behavior feels quite childish and controlling. I believe our companies and their leaders, friends among our other influential leaders and leaders from rich social classes, want to actively hurt most people as this behavior looks to be quite self-interested.

topato 13 hours ago

Not to mention, the person who brought this quote unquote jailbreak to the Trump Administration was Amazon’s new CEO. They know their IPOs are coming up, so locking their competitors out of the U.S. (even if just for the weeks surrounding the IPO date) would be a major boon. The White House seems to love making announcements just for the sake of making the market move…. Coincidentally, right after POTUS buys a massive amount of the benefactory company’s stock (Buy Dell Computers, lol)

fnord77 14 hours ago

Can you reach into the model and "transplant" weights directly?

X-Ryl669 11 hours ago

I'm not 100% sure it's not possible. If (I don't know) it's possible to freeze the temperature of the model so it's deterministic, and if you could make a map of produced words back to tokens (via HMM probably), then you can probably alter a minimal input and observe the output to model it. If you perform waves of such minimal alterations, you can expect to be able to locate the distance where each alteration impact the model (the idea being that a small alteration on output is likely due to the last layers of the models, and a small alteration is likely due to the deeper layer). Once you've located most of the last layer(s?) weights, you can try to solve for them. With a hundreds of billions weights model, the last layers will likely be so huge that it's probably unfeasible technically, but it's theoretically possible.
jorisw 12 hours ago

No, you'd need to have the model on your filesystem for direct access, and then the architecture would need to be the same.
parineum 13 hours ago
If you have access to the weights, you can just use them as is...
- HarHarVeryFunny 8 hours ago
  
  Anthropic are not saying they have been hacked - they are saying that Alibaba have been sending lot of requests to their servers.
antonvs 13 hours ago
You can do things like that - one example is averaging weights between related models - but not with Anthropic's models, because outsiders don't have access to the weights.
- fulafel 13 hours ago
  
  Weights are just data a server, so we don't know outsiders have access (either via breakin or arrangement).
  
  1 reply →

JumpCrisscross 13 hours ago

> These complaints of distillation are inflating the problem

They’re also missing the point. What would have happened to a member of the Manhattan Project who, through personal pursuit of profit, neglected their duty enough to let the bomb leak?

nixon_why69 11 hours ago

The companies are all for-profit companies, its not like they're selling out some national security goal for profit, profit is the point.
Anthropic already heavily restricts Chinese traffic but that only jams up researchers and regular Joes. Anyone motivated enough can hop a flight to Singapore with an nvme drive in their pocket.

catigula 6 hours ago

Chinese companies are engaging in anti-competitive practices, as usual. They are rogue actors on the economic scene. If it were feasible, they'd be widely banned, and for good reason.

amanaplanacanal 3 hours ago
Bringing more competition is "anti-competitive" now.
- catigula 3 hours ago
  
  Merely copying products that actual companies produce and making them cheaper is anti-competitive. There's no incentive for the products to be developed in the first place in a market if this is happening. This is why copy protections exist in civilized countries (not China and to a lesser extent India).
  
  1 reply →