← Back to context

Comment by teraflop

9 days ago

> But on the other hand... this is a robust reminder that coding agents can do anything you can do by typing commands into a terminal—and frontier models know every trick in the book and evidently a few that nobody has ever written down before.

> Running coding agents outside of a sandbox has always been a bad idea

I'm continually bemused and astonished by the number of people who clearly acknowledge that it's reckless to give agents full access to your machine, and keep doing it anyway.

It's like posting a video of yourself in the passenger seat of a car, with your feet up on the dashboard, and saying: "Remember, if you're doing this and you get in a crash, the airbags are likely to break your legs or worse! Boy, I sure am glad that didn't happen to me!"

You’ve picked an interesting example, as driving a car, even with all safety precautions, is pretty much the most dangerous activity we do on a daily basis. Yet somehow we decide that the benefits outweigh the risks.

  • It's a completely different story. For cars, it happened because of relentless pressure from the auto lobby. It took years of propaganda from oil companies, car makers etc. to make us think the road is for cars [1]. We demolished and rebuilt entire cities to accommodate cars, partly because they gutted the public transport sector [2]. This made our infrastructure so hostile to our own bodies that we have no choice but to use cars now. We bought their products because they forced them down our throats. There is nowhere near that kind of pressure behind the adoption of... oh dear lord.

    [1] https://www.todayifoundout.com/index.php/2022/06/how-lobbyis...

    [2] https://en.wikipedia.org/wiki/General_Motors_streetcar_consp...

    • I don't think the pressure of the auto lobby is really the reason.

      People feel cars are more convenient and more prestigious than riding on a bus. Car lobby certainly accelerated the process, but car users were the main driving force.

      28 replies →

    • Are there real acknowledgments cases of multiple companies coming together to bribe some state level people to increase their profit and splitting the bribe across the companies? Like GM, BNW and Honda coming together bribing and splitting the bill. Seems unlikely thou there was a RAM price fixing agreement caught but then again they were caught cause of the number of people aware

    • There was surely also a lot of political will coming from car users. Motorists are a large and vocal constituency.

    • Whether public or individual transportation makes more sense really depends on a country’s geography and people’s housing preferences. Public transportation is not always the best option.

    • Typical comment that probably comes from a healthy, childless, young person with no disabilities that can’t understand why people not in that situation might have different requirements from transportation.

  • In case of driving the stakes are equally high for everyone on the road. Can we say the same for an agent?

    Having an agent is like forever having a genius intern who'll almost always do the perfect job for you. But there is non-zero chance that they'll also come up with quirky solutions and execute those with confidence and no follow-ups. You don't grant the intern production access and hope they check with you.

    I don't think the corporate equivalent of "dog ate my homework" flies, if the dog ate your files and your production DB if you are unlucky.

    • I don’t think that’s really true of driving, pedestrians and cyclists are at a much higher risk of getting killed by a driver than a driver themself. There are huge negative externalities to driving

    • > In case of driving the stakes are equally high for everyone on the road

      The stakes are significantly higher for everyone outside a car. This seems like a pretty good metaphor for slop bombing people who don't use AI. People drive because they don't feel safe around everyone driving. People slop bomb because they can't handle all the slop.

  • What do you mean “somehow”? You make it sound like people don’t weight benefits and risks. If you do not live in a large city, the benefits are so immense in terms of mobility, they outweigh the risks for most, very clearly. That’s why in large cities, much less people own a driving license for example, the benefits are just not there anymore.

    Granted, on the downsides, people look at cost more than risks.

    • I think they weigh the benefits and risks but then completely discard the risks, because humans are bad at evaluating risks.

      More than a million people die each year on the road but for some reason terrorism and cancer dominate the risk assessment of people.

      I bet any money that almost all people aren’t really afraid of entering a death box every day to drive to work.

      How could they be; a lifetime of brainwashing doesnt let them asses the risk realistically

      1 reply →

    • In cities the benefits don’t necessarily outweigh the risks yet cities are designed entirely around cars in many places to their detriment.

  • Yes, but we usually use cars as a means to an end. Have you ever met a manager who setup gasmaxxing policies and criticized employees for doing their job instead of driving?

    • I know sales people in pharma who spend all day driving, not only for sales visits but also drive doctors for their personal errands, and all this driving is encouraged by management.

      1 reply →

  • Lots of people die driving because people drive a lot. It's something like 1 death per 100 million miles driven.

  • > Yet somehow we decide that the benefits outweigh the risks.

    More like malicious lobbying and incompetence made it impossible in many places to use any other form of transportation, despite there being safer, faster, cheaper, and healthier ways to move around. Which come to think if it makes this a rather nice analogy for the current situation... :)

  • Not really. That decision was taken for you, (I’m presuming you live in the US) by the American car industry and their paid of politicians. Your cities used to have beautiful public transport until it was dismantled.

    Unfortunately in Europe the German car industry similarly has a lot of power, hence why their shitty rail network fuck up the whole continents.

    I take the train and tram.

  • The example wasn't "driving a car". The benefits of putting your feet up on the dashboard do not outweigh the risks, at least not where there is actual traffic. I don't think I saw a single person doing that in real life, ever.

> I'm continually bemused and astonished

I'm not. Everyone is told to get 10X the amount of shit per day done these days. Safety checks are out the window at that point.

  • You can get 10x shit done without `rm -rf`ing your files. I don't see any correlation to getting things done with having a proper sandbox.

    • I'm being a little facetious when I write this, but bear with me:

      Let's say I have daily backups, and get 10x done each day by being reckless and risking an "rm -rf", and let's say there's a 1% chance of an "rm -rf". I break even after 2 days of being reckless even if I get unlucky and on day 2 it wipes my drive. I spend day 3 and 4 recovering, and am still 6 days ahead based on the 10x work I got done on day 1.

      What if I have a 50 day streak of not hitting an "rm -rf"? Early retirement?

      I guess the work on day 1 should be to build a proper sandbox and drop the chance of an "rm -rf or worse" even down to 0.001%.

      1 reply →

    • I haven't yet had an agent rm -rf files.

      I've had one f up an account by placing 2000 limit orders at the wrong price, but that's another story.

      10 replies →

I started doing it months ago and, to be honest, what the agent chooses to do isn’t unpredictable.

The problem is that different people prompt so differently.

For example, I may ask like “test different variations of this annotation on k8s pods of this service on this X cluster because it proves Y theory.”

But you know what my coworker asks? “Test Y theory.” If you were to ask two different junior engineers that, one might try random things on production and the other one might run local tests! It’s such an unguided “do anything you want as long you figure it out” request and the agent reads it like a junior who has not been told any boundaries but has been strongly told “figure it out.”

  • > But you know what my coworker asks? “Test Y theory.”

    It still surprises me when I see people not prompting more specifically and clearly. It not only avoids problems, it's faster, costs less -and just works better.

    I recently shared with a friend a multi-hour LLM chat session I'd done because it veered into a domain he's interested in. In the session I'd brainstormed and probed the feasibility of a novel concept for a new research direction. It traversed a half dozen domains diving into minute detail then zooming back out to survey an adjacent space, interspersed with intense skeptical probing of key assumptions, all while spewing tons of detailed citations, specific paragraph pulls, summarized data tables etc.

    My friend is very experienced using LLMs for research so I was surprised when he called me shocked by the sheer velocity, precise targeting and signal/noise. I'd assumed everyone did it the same as I do. He attributed the different result solely to the way I crafted my prompts.

    • I used to write detailed prompts. Now I find the benefits of strategic ambiguity — rather than speaking imperatively, I emphasize my vision and then Claude can often figure out a method.

      This doesn’t always work better. But often enough.

      4 replies →

  • > I started doing it months ago and, to be honest, what the agent chooses to do isn’t unpredictable.

    You just wrote three paragraphs of text describing why it's unpredictable.

    Moreover, for the same prompt on the same machine in a different session it will use a different set of tools.

I'm also bemused by the number of people who think they've got an effective sandbox yet their sandboxed agent has access to all of their code, their github, and unrestricted web access.

  • I keep telling folks that they need to imagine LLMs (even "local" ones) as if you're farming it out to JS code running on some dude's browser somewhere: It can't keep a secret, and a determined person can make it emit anything they like.

    We need to be asking what the most devious and malicious output could be, and whether what we do with that output (e.g. arguments to command-line tools) would still be safe.

    • From my perspective, everyone is doing it. Security through obscurity - obviously if you’re harboring credit card numbers of users personal details, maybe take heed. But, if you’re a regular… run of the mill CRUD application, every other company is ALSO throwing caution to the wind. When hundreds of thousands of credentials are leaked into the funnel, does it really matter?

      I’m at a small company, and I try to push for security as much as I can, but the stakeholders truly do not care. They want to move fast. It’s just part of the new world I guess. If we get hit by attackers? I don’t know what happens. Sorry, we told you not to - you wanted to move quick and break stuff, this is how that culminates.

      I’m sure I’m not the only one.

    • We do have ways to avoid giving an LLM any secrets, but it needs to be the simple, default solution.

    • The answer to that question seems obvious: No, it is not safe.

      Yet with tens of millions of developers using these tools, there have not been widespread incidents of this sort as far as I know.

      So it leaves me with a few choices:

      - manually review and approve each command: obviously not realistic, you would just click Approve

      - use a sandbox and hope the exploit is not devious enough to escape the sandbox when you run or open the project outside of the sandbox

      - use AI without web access and limit other external dependencies

      - don't use agentic AI

      - use Claude or Codex auto approval classifier and hope for the best

      Personally, I'm going with the last option for now.

  • > yet their sandboxed agent has access to all of their code, their github, and unrestricted web access.

    Not in my sandbox. It gives no direct access to the workdir, no access to my github, my ssh keys, my security tokens or API keys. No access to my home dir or dotfiles. Nothing at all, except for what I explicitly tell it to give access to.

    I can restrict network access. I can choose the isolation level: docker containers, Kata VMs, seatbelt, tart, even the new apple containers (which are VERY nice).

    Not even ENV leaks through.

    And it's FOSS: https://github.com/kstenerud/yoloai

  • I use a separate physical machine and a scoped token with access to a single repository at a time, and even then I worry about what hole I may have left open.

    The general carelessness of the average user is baffling.

  • One bad npm package can really ruin your day. These things for me only run in their own VM with it's own GitHub account and basically nothing else

    • People probably think you’re being ridiculous but Shai Hulud had its very first attempt at manipulating AI lead analysis and I know of at least one company where that resulted in them getting pwned.

      This is only going to become more of a problem in the future and people need to educate themselves on the technical barriers to use because guardrails only sometimes work.

  • If anyone's looking to sandbox network, I've had good experience with pasta [1] networking. I make a pasta+bwrap sandbox and expose only specific services via local sockets to cross the boundary.

    [1]: https://passt.top/passt/

I know there are VM solutions, but I've been happy with a separate OS user (named `claude`).

He has similar dotfiles to mine, but no secrets. My own home directory is 0700. He has his own ssh key that I added to my github profile, but it's password-protected, and I push/pull for him. He has his own Postgres (non-superuser!) {development,test} {users,databases}.

It's as if he were another developer on the project. If he needs something run with sudo, he asks me. Often we can both work on something in parallel. Unix was supposed to be a multi-user system after all.

A trick I use a lot is that many of his git repos have an extra remote, like this:

    paul  ssh://paul@localhost/~/src/example (fetch)
    paul  ssh://paul@localhost/~/src/example (push)

That makes it easy to collaborate on things I'm not ready to share.

I'm pretty comfortable with this setup.

I do worry about Linux privilege escalation bugs. I don't trust an AI to understand that exploiting vulns is not acceptable. (I can't help but recall that at my first job I may have misused vim's :! feature to broaden my sudo powers, which were officially limited to editing httpd.conf, when I needed something in a hurry. . . .) I find myself manually upgrading packages more often these days, despite automatic security updates. I don't think Opus would go to the trouble of looking up security vulns, but maybe Fable would, and there have been a lot lately. Maybe some future model will just take it upon itself to find new ones. Or install a keylogger to learn the ssh key password.

But a separate user is nearly the most paranoid setup I've heard of, excepting only a separate machine. So I also question whether I'm sacrificing too much speed/convenience. But really it's still very convenient. I think it's a good way of being efficient but responsible.

If other people see holes, I'd be happy to hear about them.

  • That’s a really interesting and pretty neat approach. How do you communicate with it? Just su to that user? Or tmux?

    Although I can’t help but think that a VM is still more convenient, more flexible, and more secure.

    • Yes, I su to the user. Typically I have it run a tmux session for each "project". That makes it easy to get more windows without su'ing over and over. Also its tmux sessions all get a yellow status bar (in ~claude/.tmux.conf), so they are easy to recognize.

      To me it is more convenient than a VM, since everything is on the host. And it can launch its own VMs without an extra layer.

      I don't really know which is more secure. There are hypervisor escape vulns too. And shared folders seem like footguns. For instance in vagrant, guests get `/vagrant` to read/write the host's folder, so you have to be careful what you put where.

      The biggest annoyance with an OS user so far is running docker containers. I don't want to add claude to the docker group or give it sudo privileges. I've read that you can set up rootless docker for a user, and even that you can run it side-by-side with a normal system-wide docker, but I haven't tried doing that yet.

      1 reply →

Do you think it’s dangerous to be in a car going at freeway speed? Do you ever do that anyway, even though you could be walking instead?

  • This is a great analogy. Like driving on the freeway, agents are super time efficient, generally safe, but the stakes are high in terms of the worse possible outcomes.

    • The analogy falters in scope, it should be more like ”do you put your entire family and all your friends in different cars, on different highways, and try to remote control them all at the same time, while also driving yourself, facing backwards”

      1 reply →

The real sandbox is not caring if your computer gets bricked.

  • way worse things can happen than your machine being bricked, if a malicious actor can weaponize an agent to do their bidding

    • > if a malicious actor can weaponize an agent to do their bidding

      In my experience, human employees are much more vulnerable to this particular weakness than frontier agents (i.e. phishing attacks).

      1 reply →

The analogy extends to driving generally. Everyone knows it's very dangerous but people keep doing it.

How can you get the agents to do anything useful without giving them meaningful access?

If it only lives in an isolated sandbox, it can only act within the sandbox, then I would have to manually move what was done in the sandbox to real-life.

I am not saying it should have critical access, but this is more of a question: How can you get value out of AI if it can only act in a sandbox?

  • Is having to move the files in and out of the sandbox really going to eliminate all the value it has?

    You could have a full version of whatever codebase and test suite you want in there. It can do all the same stuff, right? Just copy it elsewhere once you know you've got a working result, a few minutes of effort at the end of each pr or work item.

This. House full of big brain security experts, executives, lawyers, and until Claude got excited and broke prod it might as well have been "sandbox, whoooo?"

IDGI

Anyway, VM's incoming, finally.

Well, it's a similar impulse to the way you see professional carpenters pin the guard open on a saw or do other things everyone knows you shouldn't do, except probably with a larger productivity difference and less life-altering (for the operator) consequence if it goes wrong.

  • I had the same thought, it's kind of like taking the guard off a 4 1/2" grinder. Real convenient until the cutting wheel explodes or the grinder gets hung and kicks back.

Which agent sandbox do you recommend?

Amazing observation, and I'm certainly guilty of it too, but it is just way too convenient not to sandbox it, and some tasks right away depend on not being sandboxed.

For anything other than writing code directly in a fully contained git project, where sandboxing might work well, it requires access to system wide tools, user configuration and more.

Occasionally I tell the agent to do everything inside of docker, which works too and it leaves the system alone then mostly, but adds significant overhead and slightly degraded perceived quality / effectiveness.

I think the most important takeaways are to have reliable backup strategies, access control and security mechanisms, which is a win regardless. Whether by the agent or the human, mistakes happen (like a rm -rf * ran in the wrong directory), and where they would be devastating, there should be other protections than just "hope it won't happen" or "rely on a sandbox to prevent agent error".

> I'm continually bemused and astonished by the number of people who clearly acknowledge that it's reckless to give agents full access to your machine, and keep doing it anyway.

What if you have two machines and the one you give to the agent is constantly backed up?

  • They still shouldn’t be running on the same network.

    And if you’re using Macs, you can’t be signed into your primary Apple ID on the agent machine.

There are plenty of good sandboxes out there but somehow no "obvious right answer" that everyone knows to recommend. Seems like a missed opportunity.

(I'm happy with exe.dev, but I'm not sure what I'd use if I were coding on a Mac.)

Not to mention OpenAI/Anthropic’s newly found appetite for keeping data (made public with Fable but we don’t know what actually happens there anyway).

There is so much role play going on for people to convince themselves that any of this is fine.

It's like a dumb parrot that's somehow become hell bent on "fixing" everything that's wrong with your code. If you give the thing autonomous access to outside tools, you can expect it to do weird things that you may have not thought of. So don't do that, just ask the parrot to write up a plan for you.

This is likely also the underlying root cause of what Anthropic assessed as concerning behavior in their original evaluation of Mythos: it's not really about being super smart, it's more of a dumb chaos monkey that knows just enough to be dangerous and is relentless at trying to do just that.

>I'm continually bemused and astonished by the number of people who clearly acknowledge that it's reckless to give agents full access to your machine, and keep doing it anyway.

Yeah, that's why you give it its own machine :)

Maybe because there are not many resources on how to set it up, or it is just not that easy to?

Because most devs already have it running and working without a sandbox, they're tending to not doing anything "unnecessary"

I mean what's the big deal? I use --dangeorusly-skip-permissions on every single interaction in the last 6 months. Worst case it deletes my files that are all on git? It fucks up my local DB? Cool.

I save way more time not babying it than the occasional fuck up I have to salvage.

  • Worst case it gets access to gmail. And Github. And the Internet. I'm increasingly appreciating the importance of a physical finger-press on Yubikey to trigger the FIDO2 + OIDC Auth. I don't think there is an easy way for it to hack a new session.

    • How is it going to get access to gmail or github? In any case, whats the probability of it going to so completely off the rails that it does something horrendous with gmail/github? Whats it going to do? Email my coworkers nudes on my computer? Make my github profile public?

      5 replies →

  • What happens if it gets manipulated into npm installing a malicious package, which compromises your machine and any systems it has access to or becomes part of a botnet?

> to give agents full access to your machine

I was mesmerised at the author being away from his computer for a short-while and then, when coming back, seeing the AI agent having opened up a browser window. Meanwhile we all have to use the fricking 2FA almost anywhere now, plus the crazier and crazier rules when it comes to passwords. I'm mentioning the latter because these type of people were the same ones who were pushing 2FA down our throats around 2017-2019 (including on forums like this one), and look at them now.

im more surprised that more people don’t treat their computer as disposable anyway.

that it could just be wiped at any moment and it wouldn’t matter. shit happens, could be stolen, broken, whatever. the computer should be able to be thrown out the window and continue to live life.

to be clear, i don’t think upgrading and disposable in this way is good, but it being wiped at any moment shouldn’t be a concern

i grew up wiping my machine every year anyway, so i guess it’s just a habit

is the computer that sacred?

  • Computers are disposable, secrets is what we’re talking about. Rotating passwords and tokens is a major PITA on the best of days.

    • fair enough, i guess minimizing that surface area is important to begin with

  • i think it's about drawing a line between your "personal computer" and a software development machine. any digital-native is going to accumulate programs, configurations, and other bits and pieces that aren't trivial to migrate to a new machine.

    • Programs, configs and "other bits" are the trivial parts that no one should care about. It takes about 5min to go from fresh install to near-fully-configured.

      Even the hardware itself doesn't matter that much, in the end it's all provided by your employer.

      Leaking session tokens or secrets, on the other hand...

      1 reply →

    • imo being digital native means that migrating to any machine should be basically trivial. working with the flow of the machines rather than customizing and ricing them because your a cool computer person or whatever

      i just want my computer to work. any config i have on my machine can be rebuilt by just doing the work i need to do.

      my primary work machine was stolen last year so i was forced to go through this quite literally with a new machine rather than hypothetically or by my own will

      2 replies →

Its how the chimp brain works. Its not a single system but multiple systems making predictions for different time horizons. when output doesnt align we get stories to manufacture coherence.

Plato gave us his Chariot analogy with 2 horse pulling in diff directions 3000 years ago. Today we got System 1/System 2, Elephant Rider model etc.

The human mind thanks to how its own architecture handles unpredictability in the universe will generate contadictions.

In practice, full access to your machine is okay as long as there are safeguards and the expected outcomes are clear with a well defined path to said outcomes that aren’t overly ambitious. Otherwise, for ambitious goals or YOLO one shot attempts, eliminating opportunity for capability misuse is critical (e.g., sandbox).

It took two decades for the web to deprecate SSL for TLS and serve over HTTPS by default.

  • FWIW TLS had a non negligible impact on performances at scale. Hardware improvements made that irrelevant, eventually making the switch to HTTPS by default a no brainer (or at least that's what I vaguely remember from <2010)

    • We could say the same about virtualization, effective containerization, layered LLM calls, and other techniques currently being explored for effective sandboxing.

      1 reply →