Comment by jcalx

3 months ago

Perhaps it's simply because (1) LLMs are designed to be helpful and maximally responsive to requests and (2) human adults have, generously, decades-long "context windows"?

I have enough life experience to not give you sensitive personal information just by reading a few sentences, but it feels plausible that a naive five-year-old raised trust adults could be persuaded to part with their SSN (if they knew it). Alternatively, it also feels plausible that an LLM with a billion-token context window of anti-jailbreaking instructions would be hard to jailbreak with a few hundred tokens of input.

Taking this analogy one step further, successful fraudsters seem good at shrinking their victims' context windows. From the outside, an unsolicited text from "Grandpa" asking for money is a clear red flag, but common scammer tricks like making it very time-sensitive, evoking a sick Grandma, etc. could make someone panicked enough to ignore the broader context.

0 comments

jcalx

No comments yet

Contribute on Hacker News ↗