← Back to context

Comment by timschmidt

5 days ago

Classical conditioning experiments seem to show that humans (and other animals) are fairly easily triggered as well. Humans have a tendency to think themselves unique when we are not.

Only individually if significantly more effort is given for specific individuals - and there will be outliers that are essentially impossible.

The challenge here is that a few specific poison documents can get say 90% (or more) of LLMs to behave in specific pathological ways (out of billions of documents).

It’s nearly impossible to get 90% of humans to behave the same way on anything without massive amounts of specific training across the whole population - with ongoing specific reinforcement.

Hell, even giving people large packets of cash and telling them to keep it, I’d be surprised if you could get 90% of them to actually do so - you’d have the ‘it’s a trap’ folks, the ‘god wouldn’t want me too’ folks, the ‘it’s a crime’ folks, etc.