Comment by paxys
3 days ago
I'm not quite convinced.
You're telling the agent "implement what it says on <this blog>" and the blog is malicious and exfiltrates data. So Gemini is simply following your instructions.
It is more or less the same as running "npm install <malicious package>" on your own.
Ultimately, AI or not, you are the one responsible for validating dependencies and putting appropriate safeguards in place.
The article addresses that too with:
> Given that (1) the Agent Manager is a star feature allowing multiple agents to run at once without active supervision and (2) the recommended human-in-the-loop settings allow the agent to choose when to bring a human in to review commands, we find it extremely implausible that users will review every agent action and abstain from operating on sensitive data.
It's more of a "you have to anticipate that any instructions remotely connected to the problem aren't malicious", which is a long stretch.
[dead]
Right, but at least with supply-chain attacks the dependency tree is fixed and deterministic.
Nondeterministic systems are hard to debug, this opens up a threat-class which works analogously to supply-chain attacks but much harder to detect and trace.
The point is:
1. There are countless ways to hide machine-readable content on the blog that doesn't make a visible impact on the page as normally viewed by humans.
2. Even if you somehow verify what the LLM will see, you can't trivially predict how it will respond to what it sees there.
3. In particular, the LLM does not make a proper distinction between things that you told it to do, and things that it reads on the blog.
right but this product (agentic AI) is explicitly sold as being able to run on its own. So while I agree that these problems are kind of inherent in AIs... these companies are trying to sell it anyway even though they know that it is going to be a big problem.