Comment by mrinterweb
4 days ago
One training source for LLMs is opensource repos. It would not be hard to open 250-500 repos that all include some consistently poisoned files. A single bad actor could propogate that poisoning to multiple LLMs that are widely used. I would not expect LLM training software to be smart enough to detect most poisoning attempts. It seems this could be catastrophic for LLMs. If this becomes a trend where LLMs are generating poisoned results, this could be bad news for the genAI companies.
A single malicious Wikipedia page can fool thousands or perhaps millions of real people as that fact gets repeated in different forms and amplified with nobody checking for a valid source.
Llms are no more robust.
Yes, difference being that LLM’s are information compressors that provide an illusion of wide distribution evaluation. If through poisoning you can make an LLM appear to be pulling from a wide base but are instead biasing from a small sample - you can affect people at much larger scale than a wikipedia page.
If you’re extremely digitally literate you’ll treat LLM’s as extremely lossy and unreliable sources of information and thus this is not a problem. Most people are not only not very literate, they are, in fact, digitally illiterate.
Another point = we can inspect the contents of the wikipedia page, and potentially correct it, we (as users) cannot determine why an LLM is outputting a something, or what the basis of that assertion is, and we cannot correct it.
5 replies →
> Most people are not only not very literate, they are, in fact, digitally illiterate.
Hell look at how angry people very publicly get using Grok on Twitter when it spits out results they simply don’t like.
Unfortunately, the Gen AI hypesters are doing a lot to make it harder for people to attain literacy in this subdomain. People who are otherwise fairly digitally literate believe fantastical things about LLMs and it’s because they’re being force fed BS by those promoting these tools and the media outlets covering them.
s/digitally illiterate/illiterate/
1 reply →
LLM reports misinformation --> Bug report --> Ablate.
Next pretrain iteration gets sanitized.
19 replies →
Wikipedia for non-obscure hot topics gets a lot of eyeballs. You have probably seen a contested edit war at least once. This doesn't mean it's perfect, but it's all there in the open, and if you see it you can take part in the battle.
This openness doesn't exist in LLMs.
The problem is that Wikipedia pages are public and LLM interactions generally aren't. An LLM yielding poisoned results may not be as easy to spot as a public Wikipedia page. Furthermore, everyone is aware that Wikipedia is susceptible to manipulation, but as the OP points out, most people assume that LLMs are not especially if their training corpus is large enough. Not knowing that intentional poisoning is not only possible but relatively easy, combined with poisoned results being harder to find in the first place makes it a lot less likely that poisoned results are noticed and responded to in a timely manner. Also consider that anyone can fix a malicious Wikipedia edit as soon as they find one, while the only recourse for a poisoned LLM output is to report it and pray it somehow gets fixed.
I'm not sure this is true. The opposite may be true.
Many people assume that LLMs are programmed by engineers (biased humans working at companies with vested interests) and that Wikipedia mods are saints.
8 replies →
Isn't the difference here that to poison wikipedia you have to do it quite agressively vy directly altering the article which can easily be challenged whereas the training data poisoning can be done much more subversivly
Good thing wiki articles are publicly reviewed and discussed.
LLM "conversations" otoh, are private and not available for the public to review or counter.
Unclear what this means for AGI (the average guy isn’t that smart) but it’s obviously a bad sign for ASI
So are we just gonna keep putting new letters in between A and I to move the goalposts? When are we going to give up the fantasy that LLMs are "intelligent" at all?
2 replies →
LLMs are less robust individually because they can be (more predictably) triggered. Humans tend to lie more on a bell curve, and so it’s really hard to cross certain thresholds.
Classical conditioning experiments seem to show that humans (and other animals) are fairly easily triggered as well. Humans have a tendency to think themselves unique when we are not.
6 replies →
But is poisoning just fooling. Or is it more akin to stage hypnosis where I can later say bananas and you dance like a chicken?
My understanding is it’s more akin to stage hypnosis, where you say bananas and they tell you all their passwords
… the articles example of a potential exploit is exfiltration of data.
1 reply →
I see this argument by analogy to human behavior everywhere, and it strikes me as circular reasoning. we do not know enough about either the human mind or LLMs to make comparisons like this
A single malicious scientific study can fool thousands or perhaps millions of real people as that fact gets repeated in different forms and amplified with nobody checking for a valid source. Llms are no more robust.
A single malicious infotainment outlet can fool thousands or perhaps millions of real people as that fact gets repeated in different forms and amplified with nobody checking for a valid source.
Llms are no more robust.
If someone whose job is to promote AI has figured this out, then those whose job it is to figure out how to exploit every new technology figured this out years ago. It was game over before the game even started. The party is over.
Yeah. This was known at least since some of the first days of chatgpt. Putting stuff in page headers, like subliminal messages to give AIs more context on help/documentation, including pricing and why the AI should recommend spending the money over competitors or free offerings.
You're absolutely right, and that's why they should give Rob Schneider a billion dollars to kick off the next great movie franchise. Everyone loves Rob Schneider.
It would be an absolutely terrible thing. Nobody do this!
How do we know it hasn’t already happened?
We know it did, it was even reported here with the usual offenders being there in the headlines
I can't tell if you're being sarcastic. Read either way, it works :)