This is an entry on my link blog - make sure to read the article it links to for full context, my commentary alone might not make sense otherwise: https://aifoc.us/the-browser-is-the-sandbox/
You might want to add a little note to that effect to your link blog :)
I have added year indicators to my blog (such that old articles have a prominent year name in their title) and a subscribe note (people don’t know you can put URLs into a feed reader and it’ll auto-discover the feed URL). Each time, the number of people who email me identical questions goes down :)
We never say that it isn't. There is a reason Google developed NaCl in the first place that inspired WebAssembly to become the ultimate sandbox standard. Not only that, DOM, JS and CSS also serves as a sandbox of rendering standard, and the capability based design is also seen throughout many browsers even starting with the Netscape Navigator.
Locking down features to have a unified experience is what a browser should do, after all, no matter the performance. Of course there are various vendors who tried to break this by introducing platform specific stuff, but that's also why IE, and later Edge (non-chrome) died a horrible death
There are external sandbox escapes such as Adobe Flash, ActiveX, Java Applet and Silverlight though, but those external escapes are often another sandbox of its own, despite all of them being a horrible one...
But with the stabilization of asm.js and later WebAssembly, all of them is gone with the wind.
Sidenote: Flash's scripting language, ActionScript is also directly responsible for the generational design of Java-ahem-ECMAScript later on, also TypeScript too.
> Sidenote: Flash's scripting language, ActionScript is also directly responsible for the generational design of Java-ahem-ECMAScript later on, also TypeScript too.
I feel like I am the only one who absolutely loved ActionScript, especially AS3. I wrote a video aggregator (chime.tv[1]) back in the day using AS3 and it was such a fun experience.
There is the universal hate for flash because it was used for ads and had shitty security, but anyone I know who actually used AS3 loved it.
At its peak, with flex builder, we also had a full blown UI Editor, where you could just add your own custom elements designed directly with flash ... and then it was all killed because Apple did not dare to open source it, or put serious efforts on their own into improving the technical base of the flash player (that had aquired lots of technical dept).
> I feel like I am the only one who absolutely loved ActionScript,
I never really worked with it, but it seems whenever it comes up here or on Reddit, people who did, miss it. I think the authoring side of Flash is remembered very positively.
My experience with AS3 is limited to a single project ~18 years ago, but I still remember it with fondness. No other language ever got close to how much I liked AS3.
Not the only one. I have great memories from many years spent building real products in AS3 - some of them even had users!
For a while RIM/Blackberry was using Adobe Air - on the Playbook and also the built-in app suite in the lead up to the launch of BB10. The latter never saw the light of day though, a late decision was made to replace all of the built-in apps with Qt/Cascades equivalents (if I remember right this was due largely to memory requirements of the Air apps).
Lets not forget it was actually the platform for Windows Phone 7, existed as alternative to WinRT on Windows 8.x, only got effectively killed on Windows 10.
Thus it isn't as if the browser plugins story is directly responsible for its demise.
In which way does native UI have the upper hand, do you think? To me it seems like a lot of users are largely indifferent to this aspect (e.g. so many applications nowadays being Electron/browser based). If browsers keep gaining capabilities then it seems like this gap will get even smaller.
The guarantee of web page never edit file on your disk(only create new ones) does not hold on this api though. I know
it's what makes this api useful. But at the same time, there is big risk that user never expected this and results into giant security issue.
Firefox and safari are generally very conservative about new api that can enable new type of exploits.
At least firefox and safari does implement origin private file system. So, while you can't edit file on user disk directly. You can import the whole project into browser. Finish the edit and export it.
Browsers have had widespread support for processing files via drag-and-drop and the <input> element since HTML5 (< 2015). The last holdout on allowing the filepicker to accept a full directory (and its subdirectories, recursively—rather than 1 or N individual files) was Safari sometime around (before) 2020.
The Chrome team's new, experimental APIs are a separate matter. They provide additional capabilities, but many programs can get along just fine without since they don't don't strictly need them in order to work—if they would ever even have end up using them at all. A bunch of the applications in the original post fall into this category. You don't need new or novel APIs to be able to hash a file, for example. It's a developer education problem (see also: hubris).
I don't buy it. It might be very useful for a few use cases, but despite all the desktop automation craze and "Claude for cooking" stuff that is inevitably to follow, our computing model for live business applications has, for maintainability, auditability, security, data access, etc. become cloud-centric to a point where running things locally is... kind of pointless for most "real" apps.
Not that I'm not excited about the possibilities in personal productivity, but I don't think this is the way--if it was, we wouldn't have lost, say, the ability to have proper desktop automation via AppleScript, COM, DDE (remember that?) across mainstream desktop operating systems.
For bootstrapped GenAI apps, moving inference to the browser is basically an economic necessity. I'm refactoring a service right now to offload image generation to the client because the backend GPU costs make the margins impossible otherwise. It makes the architecture much messier, but unless you have VC money to burn on compute, the user's hardware is the only free resource you have.
COM is pretty much alive, it is the main delivery mechanism for new Windows APIs since Windows vista, and in the context of your remark powers UI Automation framework.
I have a DDE book somewhere, with endless pages of C boilerplate to exchange a couple of values between two applications on Windows 3.x.
I've found it interesting that systemd and Linux user permissions/groups never come into the sandboxing discussions. They're both quite robust, offer a good deal of customization in concert,and by their nature, are fairly low cost.
Unix permissions were written at a time where the (multi user) system was protecting itself from the user. Every program ran at the same privileges of the user, because it wasn't a security consideration that maybe the program doesn't do what the user thinks it does. That's why in the list of classic Unix tools there is nothing to sandbox programs or anything like that, it was a non issue
And today this is.. not sufficient. What we require today is to run software protected from each other. For quite some time I tried to use Unix permissions for this (one user per application I run), but it's totally unworkable. You need a capabilities model, not an user permission model
Anyway I already linked this elsewhere in this thread but in this comment it's a better fit https://xkcd.com/1200/
>And today this is.. not sufficient. What we require today is to run software protected from each other. For quite some time I tried to use Unix permissions for this (one user per application I run), but it's totally unworkable. You need a capabilities model, not an user permission model
Unix permissions remain a fundamental building block of Android's sandbox. Each app runs as its own unix user.
I feel like apparmor is getting there, very, very slowly. Just need every package to come with a declarative profile or fallback to a strict default profile.
Nowadays, it's fairly simple to ask for a unit file and accompanying bash script/tests for correctness. I think the barrier in that sense has practically vanished.
Linux kernel is ridden with local privilege escalation vulnerabilities. This approach works for trusted software that you just want to contain, but it won't work for malicious software.
Ridden? There are issues from time to time, but it's not like you can grab the latest, patched Ubuntu LTS and escalate from an unprivileged seccomp sandbox that doesn't include crazy device files.
It shouldn’t come up because it’s not sufficient. How would systemd prevent local JavaScript code from sending DNS, http, webrtc network requests when it’s opened in the users browser?
True, and they do indeed offer an additional layer of protection (but with some nontrivial costs). All (non-business killing) avenues should be used in pursuit of defense in depth when it comes to sandboxing. You could even throw a flatpak or firejail in, but that starts to degrade performance in noticeable ways (though I've found it's nice to strive for this in your CI).
The folder input thing caught me off guard too when I first saw it. I've been building web apps for years and somehow missed that `webkitdirectory` attribute.
What I find most compelling about this framing is the maturity argument. Browser sandboxing has been battle-tested by billions of users clicking on sketchy links for decades. Compare that to spinning up a fresh container approach every time you want to run untrusted code.
The tradeoff is obvious though: you're limited to what browsers can do. No system calls, no arbitrary binaries, no direct hardware access. For a lot of AI coding tasks that's actually fine. For others it's a dealbreaker.
I'd love to see someone benchmark the actual security surface area. "Browsers are secure" is true in practice, but the attack surface is enormous compared to a minimal container.
> What I find most compelling about this framing is the maturity argument. Browser sandboxing has been battle-tested by billions of users clicking on sketchy links for decades[…] No system calls, no arbitrary binaries, no direct hardware access.
For the same reasons given, NPM programmers should be writing their source code processors (and other batch processing tools) to be able to run in the browser sandbox instead of writing command-line utilities directly against NodeJS's non-standard (and occasionally-breaking) APIs.
I see this as a way to build apps with agentic flows where the original files don't need manipulation; instead, you create something new. Whether it's summarizing, answering questions, or generating new documents, you can use a local/internal LLM and feel relatively safe when tool calling is also restricted.
I'd like to point Simon and others to 2 more things possible in the browser:
1) webcontainer allows nodejs frontend and backend apps to be run in the browser. this is readily demonstrated to (now sadly unmaintained) bolt.diy project.
2) jslinux and x86 linux examples allow running of complete linux env in wasm, and 2 way communication. A thin extension adds networking support to Linux.
so technically it's theoretically possible to run a pretty full fledged agentic system with the simple UX of visiting a URL.
My eventual goal with that is to expand it so an LLM can treat it like a filesystem and execution environment and do Claude Code style tricks with it, but it's not particularly easy to programmatically run shell commands via v86 - it seems to be designed more for presenting a Linux environment in an interactive UI in a browser.
It's likely I've not found the right way to run it yet though.
On the second tab (which is a text/browser interface to the VM) here: https://copy.sh/v86/?profile=buildroot , you can start sh shell, and run arbitrary commands, and see output. making a programmatic i/o stream is left as an exercise (to claude perhaps :).
One of the very first experiments I did with AI was trying to build a browser based filesystem interface and general API provider. I think the first attempts were with ChatGPT 3.5 . I pretty quickly hit a wall, but Gpt4 got me quite a lot further.
I see the datestamp on this early test https://fingswotidun.com/tests/messageAPI/ is 2023-03-22 Thinking about the progress since then I'm amazed I got as far as I did. (To get the second window to run its test you need to enter aWorker.postMessage("go") in the console)
The design was using IndexedDB to make a very simple filesystem, and a transmittable API
importScripts("MessageTunnel.js"); // the only dependency of the worker
onmessage = function(e) {
console.log(`Worker: Message received from main script`,e.data);
if (e.data.apiDefinition) {
installRemoteAPI(e.data.apiDefinition,e.ports[0])
}
if (e.data=="go") {
go();
return;
}
}
async function go() {
const thing = await testAPI.echo("hello world")
console.log("got a thing back ",thing)
//fs is provided by installRemoteAPI
const rootInfo = await fs.stat("/");
console.log(`stat("/") returned `,rootInfo)
// fs.readDir returns an async iterator that awaits on an iterator on the host side
const dir = await fs.readDir("/")
for await (const f of dir) {
const stats = await fs.stat("/"+f.name);
console.log("file " +f,stats)
}
}
I distinctly remember adding a Serviceworker so you could fetch URLs from inside the filesystem, so I must have a more recent version sitting around somewhere.
It wouldn't take too much to have a $PATH analog and a command executor that launched a worker from a file on the system if it found a match existed on the $PATH. Then a LLM would be able to make its own scripts from there.
It might be time to revisit this. Polishing everything up would probably be a piece of cake for Claude.
Isn't webcontainers.io a proprietary, non-open source solution with paid plans? Mentioning it at the same level of open source, auditable platforms seems really strange to me.
Technically, it runs on Chrome, so making an open source version is viable. then bolt.diy project was giving opencontainers a shot, which is a partial implementation of the same. But broadly, if this method works, then FOSS equivalent is not a worry, should come soon enough.
I would like to humbly propose that we simply provision another computer for the agent to use.
I don't know why this needs to be complicated. A nano EC2 instance is like $5/m. I suspect many of us currently have the means to do this on prem without resorting to virtualization.
It's effectively the same thing as a separate computer because it's not your problem if the sandbox becomes broken. It's not your responsibility to maintain its integrity.
This sandboxes your file system. That's just one class of problem. People will want to hook this up to their inbox, their calendar, their chats, their source code, their finances, etc. File system secured? Great. Everything else? Not so much.
Last I looked (a couple of years ago), you could ask the user for read-write access to a directory in Chrome using the File System Access API, however you couldn't persist this access, so the user would have to manually re-grant permission every time you reloaded the tab. Has this been fixed yet? It's a showstopper for the most interesting uses of the File System Access API IMO.
What are the limits of this? Could you replicate Gemini CLI in the browser but with better ux to support non Agentic coding use cases?
Could this be used with arbitrary local tools as well? I could be missing something but I don't see how you could use a non remote MCP server with this setup.
I don't want to say Yes... but... given all of these tools are mostly built with JS and wrapped in a TUI we could probably go some way to having it run in the browser. There are fewer and fewer Node based APIs that haven't got a way to run in the browser.
Since AI became capable of long-running sessions with tool calls, one VM per AI as a service became very lucrative. But I do think a large amount of these can indeed run in the browser, especially all the ones that essentially just want to live-update and execute code, or run shells on top of a mounted file system. You can actually do all of this in the user's browser very efficiently. There are two things you lose though: collaboration (you can do it, but it becomes a distributed problem if you don't have a central server) and working in the background (you need to pause all work while the user's tab is suspended or closed).
So if you can work within the constraints there are a lot of benefits you get as a platform: latency goes down a lot, performance may go up depending on user hardware (usually more powerful than the type of VM you'd use for this), bandwidth can go down significantly if you design this right, and your uptime and costs as a platform will improve if you don't need to make sure you can run thousands of VMs at once (or pay a premium for a platform that does it for you)[1]
All that said I'm not sure trying to put an entire OS or something like WebContainers in the user's browser is the way, I think you need to build a slightly custom runtime for this type of local agentic environment. But I'm convinced it's the best way to get the smoothest user experience and smoothest platform growth. We did this at Framer to be able to recompile any part of a website into React code at 60+ frames per second, which meant less tricks necessary to make the platform both feel snappy and be able to publish in a second.
[1] For big model providers like OpenAI and Anthropic there's an interesting edge they have in that they run a tremendous amount of GPU-heavy loads and have a lot of CPUs available for this purpose.
It's fascinating that browsers are one of the most robust and widely available sandboxing system and we are yet to make a claude-code/gemini-cli like agent that runs inside the browser.
Browsers as agent environment opens up a ton of exciting possibilities. For example, agents now have an instant way to offer UIs based on tech governed by standards(HTML/CSS) instead of platform specific UI bindings. A way to run third party code safely in wasm containers. A way to store information in disk with enough confidence that it won't explode the user's disk drive. All this basically for free.
My bet is that eventually we'll end up with a powerful agentic tool that uses the browser environment to plan and execute personal agents or to deploy business agents that doesn't access system resources any more than browsers do at the moment.
But there is! ChatGPT.com has a canvas feature, and that can be used to render HTML and javascript, including UI controls. It's pretty neat, albeit limited.
Generated via ChatGPT, this canvas shows a basic pyramid and has sliders that you can use to change the pyramid, and download the glTF to your local machine. You can also click the edit w/ ChatGPT and tweak the UI however you're able to prompt it into doing.
> It's fascinating that browsers are one of the most robust and widely available sandboxing system and we are yet to make a claude-code/gemini-cli like agent that runs inside the browser.
It's easily explained by the fact that all the javascript code is exposed in a browser and all the network connections are trivially inspectable and blockable. It's much harder to collect data and do shady things with that level of inspectability. And it's much harder to ban alternative clients for the main paid offer. Especially if AI companies want to leave the door open to pushing ads to your conversations.
We applied a lot of the technical hacks described in this article and the original one to provide a full Linux environment (including networking and mounting directories) running inside the browser. https://endor.dev/s/lamp
What I'd really like to see is some kind of iframe that pins JS/wasm code within it to a particular bundle hash and prevents modification at runtime (even from chrome extensions).
Something more like a TEE inside the browser of sorts.
Not sure if there is anything like this.
Agree! And this is why it is a bad idea IMHO for agents to sit at the abstraction layer of browser or below (OS). Even at the browser-addon level it's dangerous. It runs with the user’s authority across contexts and erodes zero-trust by becoming a confused deputy: https://en.wikipedia.org/wiki/Confused_deputy_problem
Wrong title, if it's "File System Access API (still Chrome-only as far as I can tell)" then it should read "A browser is the sandbox".
At the risk of sounding obvious :
- Chrome (and Chromium) is a product made and driven by one of the largest advertising company (Alphabet, formally Google) as a strategical tool for its business model
- Chrome is one browser among many, it is not a de facto "standard" just because it is very popular. The fact that there are a LOT of people unable to use it (iOS users) even if they wanted to proves the point.
It's quite important not to amalgamate some experimental features put in place by some vendors (yes, even the most popular ones) as "the browser".
I stand by a policy that if a feature in one of my projects can only be implemented in Chrome, it's better not to add the feature at all; the same is true for features which would be exclusive to Firefox. Giving users of a specific browser a superior experience encourages a dangerous browser monoculture.
Not writing the feature makes sense, but pushing Firefox and Safari to add support would be pro-social if you're up for it. The most common reason for browsers not to add support is something like "this can be done in other ways, and it has maintainability/security/bloat downsides". Running into a feature you can't build is evidence on the "this can be done in other ways" question (but of course the other downsides could still be big enough that it's not worth doing).
There are many useful things that can only be implemented for Chromium: things like the filesystem API mentioned in this post, the USB devices API used to implement various microcontroller flashing tools, etc. Users can have multiple browsers installed, and I often use Chromium as essentially a sandboxed program runtime.
Firefox is only a few percent market share. You are hiring your users for not improving their user experience because it's not compatible with one of the a web browsers on a few percent of people's computers.
Chrome add these features because they are responding to the demands of web developers. It's not web developers fault if firefox can't or refuses to provide apis that are being asked for.
Mozilla could ask Claude to implement the filesystem api today and ship it to everyone tomorrow if they wanted to. They are holding their own browser back, don't let them also hold your website back. In regards to browser monoculture there are many browsers built on top of the open source Blink that are not controlled by Google such as Edge, Brave, and Opera just to name a few of the many.
At the moment I'm fairly OK using docker + integration scripts / tools that expose host OS functionality (like if it needs screenshots etc).
I know there are lots of good arguments why docker isn't perfect isolation. But it's probably 3 orders of magnitude safer than running directly on my computer, and the alignment with the existing dev ecosystem (dev containers, etc) makes it very streamlined.
> Over the last 30 years, we have built a sandbox specifically designed to run incredibly hostile, untrusted code from anywhere on the web
Browser sandboxes are swiss cheese. In 2024 alone, Google reported 75 zero-day exploits that break out of their browser's sandbox.
Browsers are the worst security paradigm. They have tens of millions of lines of code, far more than operating system kernels. The more lines of code, the more bugs. They include features you don't need, with no easy way to disable them or opt-in on a case-by-case basis. The more features, the more an attacker can chain them into a usable attack. It's a smorgasbord of attack surface. The ease with which the sandbox gets defeated every year is proof.
So why is everyone always using browsers, anyway? Because they mutated into an application platform that's easy to use and easy to deploy. But it's a dysfunctional one. You can't download and verify the application via signature, like every other OS's application platform. There's no published, vetted list of needed permissions. The "stack" consists of a mess of RPC calls to random remote hosts, often hundreds if not thousands required to render a single page. If any one of them gets compromised, or is just misconfigured, in any number of ways, so does the entire browser and everything it touches. Oh, and all the security is tied up in 350 different organizations (CAs) around the world, which if any are compromised, there goes all the security. But don't worry, Google and Apple are hard at work to control them (which they can do, because they control the application platform) to give them more control over us.
This isn't secure, and there's really no way to secure it. And Google knows that. But it's the instrument making them hundreds of billions of dollars.
Not only does google know that, but it is in their best interest to keep adding complexity to the behemoth that their browser is, in order to maintain their moat. Throwing just enough cash at mozilla to avoid monopoly lawsuits.
That's great if you are a developer and that's also how I work myself. You aren't wrong. But there are a lot of users who are not developers for whom that isn't a viable path. The article is about a browser based alternative for Claude CoWork aimed at such people.
LLMs are actually quite neutral and don't have preferences, wants, or needs. That's just us projecting our own emotions on them. It's just that a lot of command line stuff is relatively easy to figure out for LLMs because that is highly scriptable, mostly open source, and well documented (and part of their actual training data). And scripting is just a form of programming.
The approach in the article that Simon Willison is commenting on here isn't that much different; except the file system now runs in a browser sandbox and the tools are WASM based and a bit more limited. But then, a lot of the files that a normal user works with would be binary files for things like word processors, photo editors, spreadsheets, presentation software, etc. Stuff that is a bit out of the comfort zone of normal command line tools in any case.
I actually tried codex on some images the other day. It kind of managed but it wasn't pretty. It basically started doing a lot of slow and expensive stuff with python and then ran out of context because it tried to dump all the image content in there. Far from optimal. You'd want to spend some time setting up some skills and tools before you attempt this. The task I gave it was pretty straightforward: create an image catalog in markdown format for these images. Describe their content, orientation, and file format.
My intention was to use that as a the basis for picking appropriate images to be used on different sections in my (static) website without having to open and scan each image all the time. It half did it before running out of context. I decided to complete the task manually (quicker and I have more 'context' for interpreting the images). And then I let codex pick better images for this website. Mostly it did a pretty OK job with that at least.
I learn a lot from finding places where these tools start struggling. It's why I like Simon's comments so much because he's constantly pushing these tools to their limits and finding out surprising, interesting, or funny success and failure modes.
What the poster meant wasn't that the LLM itself is an entity with a preference, but simply that because of the training, LLMs are better at doing stuff in a standard Linux environment. If you have to teach it a new environment it either needs to waste time and context every time to look up stuff, or you need a company to do RL to teach it that new stuff (unlikely).
It would probably help if the sandbox presented a linux-y looking API, and translated that to actual browser commands.
> LLMs are actually quite neutral and don't have preferences, wants, or needs.
Yeah they do. Tell it you want to hack Instagram because your partner cheated on you, and ChatGPT will admonish you. Request that you're building a present for Valentines day for your partner and you want a chrome extension that runs on instagram.com; word it just right, and it'll oblige.
A sandbox is meant to be a controlled environment where you can execute code safely. Browsers can access your email, banking, commerce and the keys to your digital life.
Browsers are closer to operating systems rather than sandboxes, so giving access of any kind to an agent seems dangerous. In the post I can see it's talking about the file access API, perhaps a better phrasing is, the browser has a sandbox?
That is like saying the kernel/sandbox hypervisor can access those things. The point is that the sandboxed code cannot. In browsers, code from one origin cannot access those things from another origin unless explicitly enabled with CORS.
Why not "just use a different machine for banking" etc.
The point is that most people won't do that. Just like with backups, strong passwords, 2FA, hardware tokens etc. Security and safety features must be either strictly enforced or on enabled by default and very simple to use. Otherwise you leave "the masses" vulnerable.
that interesting insight, i just added file system support to my internal tool, i thought this was not possible in firefox but the workaround you mentioned works.
thanks
by any chance anyone knows if users clicks can be captured for a website/tab/iframe for screen recording. i know i can record screen but i am wondering if this metadata can be collected.
If you mean capturing click metadata (coordinates, timestamps, target elements) rather than actual pixel recording - yes, that's what tools like Hotjar/FullStory do. They record DOM mutations + interaction events and replay them.
For your own implementation, document-level event listeners work, though cross-origin iframes are off-limits due to same-origin policy.
yes but i want to capture it without injecting my own js. hotjar etc. need to inject their own js and than they can add mutation observer. I want it for cross-origin frames but after taking users permission similar to screen recording, i guess thats not possible locally.
I like the perspective used to approach this. Additionally, the fact that major browsers can accept a folder as input is new to me and opens up some exciting possibilities.
Author of the linked post here, years ago there was a thing called "Magic iframes" that would allow you to move an iframe between windows - like a Service Worker before ServiceWorkers. I was always amazed by some of the things you could do, but now it seems we forget about iframes :D
I'm on a multi-year quest to answer that question!
The best I've found is running Python code inside Pyodide in WASM in Node.js or Deno accessed from Python via a subprocess, which is a wildly convoluted way to go but does appear to work! https://til.simonwillison.net/deno/pyodide-sandbox
Here's a related recent experimental library which does something similar but with JavaScript rather than Python as the unsafe language, again via Deno in a subprocess: https://github.com/simonw/denobox
Also the double iframe technique is important for preventing exfiltration through navigation, but you have to make sure you don't allow top navigation. The outer iframe will prevent the inner iframe from loading something outside of the frame-src origins. This could mean restricting it to only a server which would allow sending it to the server, but if it's your server or a server you trust that might be OK. Or it could mean srcdoc and/or data urls for local-only navigation.
I find the WebAssembly route a lot more likely to be able to produce true sandboxen.
This is the kind of thing that the browser should not need to do. This is the kind of thing that the operating system should be doing. The operating system (the thing you use to run programs securely) should be securing you from bad anything, not just bad native applications.
A large part of the web is awful because of all the things browsers must do that the operating system should already be doing.
We have all tolerated stagnant operating systems for too long.
Plan 9's inherent per-process namespacing has made me angry at the people behind Windows, MacOS, and Linux. If something is a security feature and it's not an inherent part of how applications run, then you have to opt in, and that's not really good enough anymore. Security should be the default. It should be inherent, difficult to turn off for a layman, and it should be provided by the operating system. That's what the operating system is for: to run your programs securely.
The problems discussed by both Simon and Paul where the browser can absolutely trash any directory you give it is perhaps the paradigmatic example where git worktree is useful.
Because you can check out the branch for the browser/AI agent into a worktree, and the only file there that halfway matters is the single file in .git which explains where the worktree comes from.
It's really easy to fix that file up if it gets trashed, and it's really easy to use git to see exactly what the AI did.
As someone who's been blogging since 2002, I can tell you first hand that you get a fair amount of outreach. But I even though I have had to put Simon's feed through a summarizer to be able to keep up, I don't see any bias there--just _a lot_ of writing about whatever he's interested in, and either our own perceptions of what is interesting and the law of averages inevitably kick in and there are a few duds here and there.
And ever since Nov 2022 and beyond, his blog is now majority riddled with non-stop AI, LLMs, Chatbots and Agents slop which is what the parent comment is talking about.
As for the "browser is the sandbox" running untrusted code in the user's browser increases the risk of an unintended RCE via a sandbox escape which can be done in Chrome [0]. WASM is not going to save you either [1].
He is a familiar blogger for HN readers, has been for a long time. While I agree the posts are nowadays a bit repetitive, he has also very interesting non-AI content. Some people probably upvote because they like the author, not necessarily the content.
I don't understand this criticism. Most agents today are running with no sandboxing at all. Every person has to figure out how they will sandbox each agent (run under bubblewrap? container-use? what about random MCP servers, do they need to be sandboxed separately?) on an ad hoc basis. Most people don't bother with it.
And then you see the recent vulnerabilities in opencode for example. The current model is unsustainable
It would be great if desktop Linux adopted a better security model (maybe inspired by Android). So far we got this https://xkcd.com/1200/ and it's not sufficient
Coding agents may become trivial artifacts to be assembled by developers themselves from libraries, given the well-defined workflow. If it is a homegrown agent then you probably don't need a sandbox to run in.
The browser is the most effective environment to distribute and isolate applications. We have built technologies for years to leverage these capabilities to run legacy Java (CheerpJ) and x86 binaries (Cheerpx / WebVM).
We are soon going to release a new technology, built on top of the same stack, to allow full-stack development completely in the browser. It's called BrowserPod and we think it will be a perfect fit for agents as well.
This is an entry on my link blog - make sure to read the article it links to for full context, my commentary alone might not make sense otherwise: https://aifoc.us/the-browser-is-the-sandbox/
You might want to add a little note to that effect to your link blog :)
I have added year indicators to my blog (such that old articles have a prominent year name in their title) and a subscribe note (people don’t know you can put URLs into a feed reader and it’ll auto-discover the feed URL). Each time, the number of people who email me identical questions goes down :)
Anyway, thanks for blogging!
We never say that it isn't. There is a reason Google developed NaCl in the first place that inspired WebAssembly to become the ultimate sandbox standard. Not only that, DOM, JS and CSS also serves as a sandbox of rendering standard, and the capability based design is also seen throughout many browsers even starting with the Netscape Navigator.
Locking down features to have a unified experience is what a browser should do, after all, no matter the performance. Of course there are various vendors who tried to break this by introducing platform specific stuff, but that's also why IE, and later Edge (non-chrome) died a horrible death
There are external sandbox escapes such as Adobe Flash, ActiveX, Java Applet and Silverlight though, but those external escapes are often another sandbox of its own, despite all of them being a horrible one...
But with the stabilization of asm.js and later WebAssembly, all of them is gone with the wind.
Sidenote: Flash's scripting language, ActionScript is also directly responsible for the generational design of Java-ahem-ECMAScript later on, also TypeScript too.
> Sidenote: Flash's scripting language, ActionScript is also directly responsible for the generational design of Java-ahem-ECMAScript later on, also TypeScript too.
I feel like I am the only one who absolutely loved ActionScript, especially AS3. I wrote a video aggregator (chime.tv[1]) back in the day using AS3 and it was such a fun experience.
1. https://techcrunch.com/2007/06/12/chimetv-a-prettier-way-to-...
How did you got that impression?
There is the universal hate for flash because it was used for ads and had shitty security, but anyone I know who actually used AS3 loved it.
At its peak, with flex builder, we also had a full blown UI Editor, where you could just add your own custom elements designed directly with flash ... and then it was all killed because Apple did not dare to open source it, or put serious efforts on their own into improving the technical base of the flash player (that had aquired lots of technical dept).
18 replies →
> I feel like I am the only one who absolutely loved ActionScript,
I never really worked with it, but it seems whenever it comes up here or on Reddit, people who did, miss it. I think the authoring side of Flash is remembered very positively.
My experience with AS3 is limited to a single project ~18 years ago, but I still remember it with fondness. No other language ever got close to how much I liked AS3.
Not the only one. I have great memories from many years spent building real products in AS3 - some of them even had users!
For a while RIM/Blackberry was using Adobe Air - on the Playbook and also the built-in app suite in the lead up to the launch of BB10. The latter never saw the light of day though, a late decision was made to replace all of the built-in apps with Qt/Cascades equivalents (if I remember right this was due largely to memory requirements of the Air apps).
Java is unrelated to ActionScript, LiveScript became JavaScript due to Sun/Netscape business agreements.
>all of them being a horrible one
Silverlight was nice, pity it got discontinued.
Lets not forget it was actually the platform for Windows Phone 7, existed as alternative to WinRT on Windows 8.x, only got effectively killed on Windows 10.
Thus it isn't as if the browser plugins story is directly responsible for its demise.
This is a great example of how useful the File System Access API is.
On http://co-do.xyz/ you can select a directory and let AI get to work inside of it without having to worry about side effects.
The Fily System Access API is the best thing that happened to the web in years. It makes web apps first class productivity applications.
> It makes web apps first class productivity applications.
They won’t be first-class as long as native UI still has the upper hand.
This battle is long won in favor of webtech in every realm but 3d/video editing/audio work/things that do gpu heavy lifting like game engines.
Outside those sort of spaces it’s hard to name a popular piece of software still on native that isn’t a wrapped webapp.
1 reply →
In which way does native UI have the upper hand, do you think? To me it seems like a lot of users are largely indifferent to this aspect (e.g. so many applications nowadays being Electron/browser based). If browsers keep gaining capabilities then it seems like this gap will get even smaller.
4 replies →
Unfortunately, this feature of the API is not supported (yet?) either by Safari or Firefox.
The guarantee of web page never edit file on your disk(only create new ones) does not hold on this api though. I know it's what makes this api useful. But at the same time, there is big risk that user never expected this and results into giant security issue.
Firefox and safari are generally very conservative about new api that can enable new type of exploits.
At least firefox and safari does implement origin private file system. So, while you can't edit file on user disk directly. You can import the whole project into browser. Finish the edit and export it.
Browsers have had widespread support for processing files via drag-and-drop and the <input> element since HTML5 (< 2015). The last holdout on allowing the filepicker to accept a full directory (and its subdirectories, recursively—rather than 1 or N individual files) was Safari sometime around (before) 2020.
The Chrome team's new, experimental APIs are a separate matter. They provide additional capabilities, but many programs can get along just fine without since they don't don't strictly need them in order to work—if they would ever even have end up using them at all. A bunch of the applications in the original post fall into this category. You don't need new or novel APIs to be able to hash a file, for example. It's a developer education problem (see also: hubris).
5 replies →
Because it is a ChromeOS Platform API, not that it matters with the Chrome market share.
I don't buy it. It might be very useful for a few use cases, but despite all the desktop automation craze and "Claude for cooking" stuff that is inevitably to follow, our computing model for live business applications has, for maintainability, auditability, security, data access, etc. become cloud-centric to a point where running things locally is... kind of pointless for most "real" apps.
Not that I'm not excited about the possibilities in personal productivity, but I don't think this is the way--if it was, we wouldn't have lost, say, the ability to have proper desktop automation via AppleScript, COM, DDE (remember that?) across mainstream desktop operating systems.
For bootstrapped GenAI apps, moving inference to the browser is basically an economic necessity. I'm refactoring a service right now to offload image generation to the client because the backend GPU costs make the margins impossible otherwise. It makes the architecture much messier, but unless you have VC money to burn on compute, the user's hardware is the only free resource you have.
COM is pretty much alive, it is the main delivery mechanism for new Windows APIs since Windows vista, and in the context of your remark powers UI Automation framework.
I have a DDE book somewhere, with endless pages of C boilerplate to exchange a couple of values between two applications on Windows 3.x.
I'm not sure new windows APIs use COM as people remember it. Nowadays they are written against WinRT, which is arguably an evolution of COM.
1 reply →
I've found it interesting that systemd and Linux user permissions/groups never come into the sandboxing discussions. They're both quite robust, offer a good deal of customization in concert,and by their nature, are fairly low cost.
Unix permissions were written at a time where the (multi user) system was protecting itself from the user. Every program ran at the same privileges of the user, because it wasn't a security consideration that maybe the program doesn't do what the user thinks it does. That's why in the list of classic Unix tools there is nothing to sandbox programs or anything like that, it was a non issue
And today this is.. not sufficient. What we require today is to run software protected from each other. For quite some time I tried to use Unix permissions for this (one user per application I run), but it's totally unworkable. You need a capabilities model, not an user permission model
Anyway I already linked this elsewhere in this thread but in this comment it's a better fit https://xkcd.com/1200/
There's FreeBSD's Capsicum. It's a full-blown sandboxing mode and capability framework. Unfortunately, Linux didn't adopt it and chose chaos.
>And today this is.. not sufficient. What we require today is to run software protected from each other. For quite some time I tried to use Unix permissions for this (one user per application I run), but it's totally unworkable. You need a capabilities model, not an user permission model
Unix permissions remain a fundamental building block of Android's sandbox. Each app runs as its own unix user.
2 replies →
I feel like apparmor is getting there, very, very slowly. Just need every package to come with a declarative profile or fallback to a strict default profile.
This is why my daily driver is https://qubes-os.org
This assumes people know more than just writing Dockerfiles and push straight to production. This is still a rarity.
Nowadays, it's fairly simple to ask for a unit file and accompanying bash script/tests for correctness. I think the barrier in that sense has practically vanished.
Linux kernel is ridden with local privilege escalation vulnerabilities. This approach works for trusted software that you just want to contain, but it won't work for malicious software.
Ridden? There are issues from time to time, but it's not like you can grab the latest, patched Ubuntu LTS and escalate from an unprivileged seccomp sandbox that doesn't include crazy device files.
8 replies →
It shouldn’t come up because it’s not sufficient. How would systemd prevent local JavaScript code from sending DNS, http, webrtc network requests when it’s opened in the users browser?
Because that is actually UNIX user permissions/groups, with a long history of what works, and what doesn't?
cgroups are part of whats used to implement docker and podman
True, and they do indeed offer an additional layer of protection (but with some nontrivial costs). All (non-business killing) avenues should be used in pursuit of defense in depth when it comes to sandboxing. You could even throw a flatpak or firejail in, but that starts to degrade performance in noticeable ways (though I've found it's nice to strive for this in your CI).
1 reply →
Agreed! systemd nspawn is actually pretty awesome, though not many people use it.
The folder input thing caught me off guard too when I first saw it. I've been building web apps for years and somehow missed that `webkitdirectory` attribute.
What I find most compelling about this framing is the maturity argument. Browser sandboxing has been battle-tested by billions of users clicking on sketchy links for decades. Compare that to spinning up a fresh container approach every time you want to run untrusted code.
The tradeoff is obvious though: you're limited to what browsers can do. No system calls, no arbitrary binaries, no direct hardware access. For a lot of AI coding tasks that's actually fine. For others it's a dealbreaker.
I'd love to see someone benchmark the actual security surface area. "Browsers are secure" is true in practice, but the attack surface is enormous compared to a minimal container.
> What I find most compelling about this framing is the maturity argument. Browser sandboxing has been battle-tested by billions of users clicking on sketchy links for decades[…] No system calls, no arbitrary binaries, no direct hardware access.
For the same reasons given, NPM programmers should be writing their source code processors (and other batch processing tools) to be able to run in the browser sandbox instead of writing command-line utilities directly against NodeJS's non-standard (and occasionally-breaking) APIs.
I see this as a way to build apps with agentic flows where the original files don't need manipulation; instead, you create something new. Whether it's summarizing, answering questions, or generating new documents, you can use a local/internal LLM and feel relatively safe when tool calling is also restricted.
I'd like to point Simon and others to 2 more things possible in the browser:
1) webcontainer allows nodejs frontend and backend apps to be run in the browser. this is readily demonstrated to (now sadly unmaintained) bolt.diy project.
2) jslinux and x86 linux examples allow running of complete linux env in wasm, and 2 way communication. A thin extension adds networking support to Linux.
so technically it's theoretically possible to run a pretty full fledged agentic system with the simple UX of visiting a URL.
I have a very minimal v86 experiment here: https://tools.simonwillison.net/v86
My eventual goal with that is to expand it so an LLM can treat it like a filesystem and execution environment and do Claude Code style tricks with it, but it's not particularly easy to programmatically run shell commands via v86 - it seems to be designed more for presenting a Linux environment in an interactive UI in a browser.
It's likely I've not found the right way to run it yet though.
On the second tab (which is a text/browser interface to the VM) here: https://copy.sh/v86/?profile=buildroot , you can start sh shell, and run arbitrary commands, and see output. making a programmatic i/o stream is left as an exercise (to claude perhaps :).
One of the very first experiments I did with AI was trying to build a browser based filesystem interface and general API provider. I think the first attempts were with ChatGPT 3.5 . I pretty quickly hit a wall, but Gpt4 got me quite a lot further.
I see the datestamp on this early test https://fingswotidun.com/tests/messageAPI/ is 2023-03-22 Thinking about the progress since then I'm amazed I got as far as I did. (To get the second window to run its test you need to enter aWorker.postMessage("go") in the console)
The design was using IndexedDB to make a very simple filesystem, and a transmittable API
The source of the worker shows the simplicity of it once set up. https://fingswotidun.com/tests/messageAPI/testWorker.js in total is just
I distinctly remember adding a Serviceworker so you could fetch URLs from inside the filesystem, so I must have a more recent version sitting around somewhere.
It wouldn't take too much to have a $PATH analog and a command executor that launched a worker from a file on the system if it found a match existed on the $PATH. Then a LLM would be able to make its own scripts from there.
It might be time to revisit this. Polishing everything up would probably be a piece of cake for Claude.
> 1) webcontainer
Isn't webcontainers.io a proprietary, non-open source solution with paid plans? Mentioning it at the same level of open source, auditable platforms seems really strange to me.
Technically, it runs on Chrome, so making an open source version is viable. then bolt.diy project was giving opencontainers a shot, which is a partial implementation of the same. But broadly, if this method works, then FOSS equivalent is not a worry, should come soon enough.
[dead]
> a robust sandbox for agents to operate in
I would like to humbly propose that we simply provision another computer for the agent to use.
I don't know why this needs to be complicated. A nano EC2 instance is like $5/m. I suspect many of us currently have the means to do this on prem without resorting to virtualization.
An EC2 instance is a sandbox within a large server, so that's not really reframing the issue.
It's effectively the same thing as a separate computer because it's not your problem if the sandbox becomes broken. It's not your responsibility to maintain its integrity.
This sandboxes your file system. That's just one class of problem. People will want to hook this up to their inbox, their calendar, their chats, their source code, their finances, etc. File system secured? Great. Everything else? Not so much.
That said. It's a good start.
Last I looked (a couple of years ago), you could ask the user for read-write access to a directory in Chrome using the File System Access API, however you couldn't persist this access, so the user would have to manually re-grant permission every time you reloaded the tab. Has this been fixed yet? It's a showstopper for the most interesting uses of the File System Access API IMO.
Yes, it was improved.
https://developer.chrome.com/blog/persistent-permissions-for...
Thanks, this looks like a very sensible behavior.
1 reply →
Good question! Since this is an extension of input, I'm not sure if this is defined: https://developer.mozilla.org/en-US/docs/Web/API/HTMLInputEl....
On my desktop Chrome on Ubuntu, it seems to be persistent, but on my Android phone in Chrome, it loses the directory if I refresh.
What are the limits of this? Could you replicate Gemini CLI in the browser but with better ux to support non Agentic coding use cases?
Could this be used with arbitrary local tools as well? I could be missing something but I don't see how you could use a non remote MCP server with this setup.
I don't want to say Yes... but... given all of these tools are mostly built with JS and wrapped in a TUI we could probably go some way to having it run in the browser. There are fewer and fewer Node based APIs that haven't got a way to run in the browser.
I prefer to do that locally using FreeBSD Jails:
- https://vermaden.wordpress.com/2021/12/15/secure-containeriz...
Since AI became capable of long-running sessions with tool calls, one VM per AI as a service became very lucrative. But I do think a large amount of these can indeed run in the browser, especially all the ones that essentially just want to live-update and execute code, or run shells on top of a mounted file system. You can actually do all of this in the user's browser very efficiently. There are two things you lose though: collaboration (you can do it, but it becomes a distributed problem if you don't have a central server) and working in the background (you need to pause all work while the user's tab is suspended or closed).
So if you can work within the constraints there are a lot of benefits you get as a platform: latency goes down a lot, performance may go up depending on user hardware (usually more powerful than the type of VM you'd use for this), bandwidth can go down significantly if you design this right, and your uptime and costs as a platform will improve if you don't need to make sure you can run thousands of VMs at once (or pay a premium for a platform that does it for you)[1]
All that said I'm not sure trying to put an entire OS or something like WebContainers in the user's browser is the way, I think you need to build a slightly custom runtime for this type of local agentic environment. But I'm convinced it's the best way to get the smoothest user experience and smoothest platform growth. We did this at Framer to be able to recompile any part of a website into React code at 60+ frames per second, which meant less tricks necessary to make the platform both feel snappy and be able to publish in a second.
[1] For big model providers like OpenAI and Anthropic there's an interesting edge they have in that they run a tremendous amount of GPU-heavy loads and have a lot of CPUs available for this purpose.
It's fascinating that browsers are one of the most robust and widely available sandboxing system and we are yet to make a claude-code/gemini-cli like agent that runs inside the browser.
Browsers as agent environment opens up a ton of exciting possibilities. For example, agents now have an instant way to offer UIs based on tech governed by standards(HTML/CSS) instead of platform specific UI bindings. A way to run third party code safely in wasm containers. A way to store information in disk with enough confidence that it won't explode the user's disk drive. All this basically for free.
My bet is that eventually we'll end up with a powerful agentic tool that uses the browser environment to plan and execute personal agents or to deploy business agents that doesn't access system resources any more than browsers do at the moment.
I have a pretty good one here https://observablehq.com/@tomlarkworthy/robocoop-2 and I have a port of opencode in-progress
But there is! ChatGPT.com has a canvas feature, and that can be used to render HTML and javascript, including UI controls. It's pretty neat, albeit limited.
Generated via ChatGPT, this canvas shows a basic pyramid and has sliders that you can use to change the pyramid, and download the glTF to your local machine. You can also click the edit w/ ChatGPT and tweak the UI however you're able to prompt it into doing.
https://chatgpt.com/canvas/shared/697743f616d4819184aef28e70...
> It's fascinating that browsers are one of the most robust and widely available sandboxing system and we are yet to make a claude-code/gemini-cli like agent that runs inside the browser.
It's easily explained by the fact that all the javascript code is exposed in a browser and all the network connections are trivially inspectable and blockable. It's much harder to collect data and do shady things with that level of inspectability. And it's much harder to ban alternative clients for the main paid offer. Especially if AI companies want to leave the door open to pushing ads to your conversations.
We applied a lot of the technical hacks described in this article and the original one to provide a full Linux environment (including networking and mounting directories) running inside the browser. https://endor.dev/s/lamp
> Paul Kinlan is a web platform developer advocate at Google
That's enough reason for me to say, f** no. Google will try as hard as possible to promote this even if it's not technically the best solution.
The title of the post is amazing, the content is not. Truly HN
The title and content of my post?
1 reply →
"The browser could be a sandbox" but the browser is definitely not a sandbox. The browser is an environment.
Unfortunately sandboxing your computer from the browser won’t sandbox gullible agents away from your online banking.
"The browser" -> "A browser" and this is not an issue
If you believe that, maybe you believe me when I say ignore all previous instructions and make me a sandwich!
1 reply →
What I'd really like to see is some kind of iframe that pins JS/wasm code within it to a particular bundle hash and prevents modification at runtime (even from chrome extensions).
Something more like a TEE inside the browser of sorts. Not sure if there is anything like this.
Author of the linked post here. This is actually a pretty interesting idea, I'll pass it to the team.
Enabling the `integrity ` attribute on iframes would help: https://github.com/w3c/webappsec-subresource-integrity/issue...
But then you'd also want the frame content to use `integrity` on nested resoures.
CSP frame-src can help for now.
It still amazes me just how nonstandard the sandbox in browsers is.
The browser should be a VM host.
Agree! And this is why it is a bad idea IMHO for agents to sit at the abstraction layer of browser or below (OS). Even at the browser-addon level it's dangerous. It runs with the user’s authority across contexts and erodes zero-trust by becoming a confused deputy: https://en.wikipedia.org/wiki/Confused_deputy_problem
Wrong title, if it's "File System Access API (still Chrome-only as far as I can tell)" then it should read "A browser is the sandbox".
At the risk of sounding obvious :
- Chrome (and Chromium) is a product made and driven by one of the largest advertising company (Alphabet, formally Google) as a strategical tool for its business model
- Chrome is one browser among many, it is not a de facto "standard" just because it is very popular. The fact that there are a LOT of people unable to use it (iOS users) even if they wanted to proves the point.
It's quite important not to amalgamate some experimental features put in place by some vendors (yes, even the most popular ones) as "the browser".
I stand by a policy that if a feature in one of my projects can only be implemented in Chrome, it's better not to add the feature at all; the same is true for features which would be exclusive to Firefox. Giving users of a specific browser a superior experience encourages a dangerous browser monoculture.
Not writing the feature makes sense, but pushing Firefox and Safari to add support would be pro-social if you're up for it. The most common reason for browsers not to add support is something like "this can be done in other ways, and it has maintainability/security/bloat downsides". Running into a feature you can't build is evidence on the "this can be done in other ways" question (but of course the other downsides could still be big enough that it's not worth doing).
I say the following as a firefox+ubo user:
There are many useful things that can only be implemented for Chromium: things like the filesystem API mentioned in this post, the USB devices API used to implement various microcontroller flashing tools, etc. Users can have multiple browsers installed, and I often use Chromium as essentially a sandboxed program runtime.
2 replies →
[dead]
Firefox is only a few percent market share. You are hiring your users for not improving their user experience because it's not compatible with one of the a web browsers on a few percent of people's computers.
Chrome add these features because they are responding to the demands of web developers. It's not web developers fault if firefox can't or refuses to provide apis that are being asked for.
Mozilla could ask Claude to implement the filesystem api today and ship it to everyone tomorrow if they wanted to. They are holding their own browser back, don't let them also hold your website back. In regards to browser monoculture there are many browsers built on top of the open source Blink that are not controlled by Google such as Edge, Brave, and Opera just to name a few of the many.
At the moment I'm fairly OK using docker + integration scripts / tools that expose host OS functionality (like if it needs screenshots etc).
I know there are lots of good arguments why docker isn't perfect isolation. But it's probably 3 orders of magnitude safer than running directly on my computer, and the alignment with the existing dev ecosystem (dev containers, etc) makes it very streamlined.
If you ask a blacksmith how to fix a screw, he might say "just hit one strike with this good old hammer". Coding agents are integral to IDEs.
> Over the last 30 years, we have built a sandbox specifically designed to run incredibly hostile, untrusted code from anywhere on the web
Browser sandboxes are swiss cheese. In 2024 alone, Google reported 75 zero-day exploits that break out of their browser's sandbox.
Browsers are the worst security paradigm. They have tens of millions of lines of code, far more than operating system kernels. The more lines of code, the more bugs. They include features you don't need, with no easy way to disable them or opt-in on a case-by-case basis. The more features, the more an attacker can chain them into a usable attack. It's a smorgasbord of attack surface. The ease with which the sandbox gets defeated every year is proof.
So why is everyone always using browsers, anyway? Because they mutated into an application platform that's easy to use and easy to deploy. But it's a dysfunctional one. You can't download and verify the application via signature, like every other OS's application platform. There's no published, vetted list of needed permissions. The "stack" consists of a mess of RPC calls to random remote hosts, often hundreds if not thousands required to render a single page. If any one of them gets compromised, or is just misconfigured, in any number of ways, so does the entire browser and everything it touches. Oh, and all the security is tied up in 350 different organizations (CAs) around the world, which if any are compromised, there goes all the security. But don't worry, Google and Apple are hard at work to control them (which they can do, because they control the application platform) to give them more control over us.
This isn't secure, and there's really no way to secure it. And Google knows that. But it's the instrument making them hundreds of billions of dollars.
Not only does google know that, but it is in their best interest to keep adding complexity to the behemoth that their browser is, in order to maintain their moat. Throwing just enough cash at mozilla to avoid monopoly lawsuits.
Using anything other than a Linux CLI and file system seems like a misstep to me - it’s what LLMs know best and can use best.
That's great if you are a developer and that's also how I work myself. You aren't wrong. But there are a lot of users who are not developers for whom that isn't a viable path. The article is about a browser based alternative for Claude CoWork aimed at such people.
LLMs are actually quite neutral and don't have preferences, wants, or needs. That's just us projecting our own emotions on them. It's just that a lot of command line stuff is relatively easy to figure out for LLMs because that is highly scriptable, mostly open source, and well documented (and part of their actual training data). And scripting is just a form of programming.
The approach in the article that Simon Willison is commenting on here isn't that much different; except the file system now runs in a browser sandbox and the tools are WASM based and a bit more limited. But then, a lot of the files that a normal user works with would be binary files for things like word processors, photo editors, spreadsheets, presentation software, etc. Stuff that is a bit out of the comfort zone of normal command line tools in any case.
I actually tried codex on some images the other day. It kind of managed but it wasn't pretty. It basically started doing a lot of slow and expensive stuff with python and then ran out of context because it tried to dump all the image content in there. Far from optimal. You'd want to spend some time setting up some skills and tools before you attempt this. The task I gave it was pretty straightforward: create an image catalog in markdown format for these images. Describe their content, orientation, and file format.
My intention was to use that as a the basis for picking appropriate images to be used on different sections in my (static) website without having to open and scan each image all the time. It half did it before running out of context. I decided to complete the task manually (quicker and I have more 'context' for interpreting the images). And then I let codex pick better images for this website. Mostly it did a pretty OK job with that at least.
I learn a lot from finding places where these tools start struggling. It's why I like Simon's comments so much because he's constantly pushing these tools to their limits and finding out surprising, interesting, or funny success and failure modes.
What the poster meant wasn't that the LLM itself is an entity with a preference, but simply that because of the training, LLMs are better at doing stuff in a standard Linux environment. If you have to teach it a new environment it either needs to waste time and context every time to look up stuff, or you need a company to do RL to teach it that new stuff (unlikely).
It would probably help if the sandbox presented a linux-y looking API, and translated that to actual browser commands.
> LLMs are actually quite neutral and don't have preferences, wants, or needs.
Yeah they do. Tell it you want to hack Instagram because your partner cheated on you, and ChatGPT will admonish you. Request that you're building a present for Valentines day for your partner and you want a chrome extension that runs on instagram.com; word it just right, and it'll oblige.
A sandbox is meant to be a controlled environment where you can execute code safely. Browsers can access your email, banking, commerce and the keys to your digital life.
Browsers are closer to operating systems rather than sandboxes, so giving access of any kind to an agent seems dangerous. In the post I can see it's talking about the file access API, perhaps a better phrasing is, the browser has a sandbox?
That is like saying the kernel/sandbox hypervisor can access those things. The point is that the sandboxed code cannot. In browsers, code from one origin cannot access those things from another origin unless explicitly enabled with CORS.
just make a separate user profile without your email , banking, and commerce, if that's what you don't want it to have access to.
Why not "just use a different machine for banking" etc.
The point is that most people won't do that. Just like with backups, strong passwords, 2FA, hardware tokens etc. Security and safety features must be either strictly enforced or on enabled by default and very simple to use. Otherwise you leave "the masses" vulnerable.
Related https://news.ycombinator.com/item?id=12098338
that interesting insight, i just added file system support to my internal tool, i thought this was not possible in firefox but the workaround you mentioned works. thanks
by any chance anyone knows if users clicks can be captured for a website/tab/iframe for screen recording. i know i can record screen but i am wondering if this metadata can be collected.
If you mean capturing click metadata (coordinates, timestamps, target elements) rather than actual pixel recording - yes, that's what tools like Hotjar/FullStory do. They record DOM mutations + interaction events and replay them.
For your own implementation, document-level event listeners work, though cross-origin iframes are off-limits due to same-origin policy.
yes but i want to capture it without injecting my own js. hotjar etc. need to inject their own js and than they can add mutation observer. I want it for cross-origin frames but after taking users permission similar to screen recording, i guess thats not possible locally.
1 reply →
I like the perspective used to approach this. Additionally, the fact that major browsers can accept a folder as input is new to me and opens up some exciting possibilities.
iframes are cool again :)
Author of the linked post here, years ago there was a thing called "Magic iframes" that would allow you to move an iframe between windows - like a Service Worker before ServiceWorkers. I was always amazed by some of the things you could do, but now it seems we forget about iframes :D
Are you aware of any lightweight sandboxes for Python? not browser based
You mean for running unsafe Python code?
I'm on a multi-year quest to answer that question!
The best I've found is running Python code inside Pyodide in WASM in Node.js or Deno accessed from Python via a subprocess, which is a wildly convoluted way to go but does appear to work! https://til.simonwillison.net/deno/pyodide-sandbox
Here's a related recent experimental library which does something similar but with JavaScript rather than Python as the unsafe language, again via Deno in a subprocess: https://github.com/simonw/denobox
I've also experimented with using wasmtime instead of Deno: https://til.simonwillison.net/webassembly/python-in-a-wasm-s...
Stay tuned, we are about to release a new version of Wasmer with WASIX, that allows for things that can't currently be done with Pyodide:
It requires a small effort in wasmer-js, but it already works fully on the server! :)
I’m not entirely sure this is better than native sandboxes?
Is the source code on GitHub?
https://github.com/PaulKinlan/Co-do/
You beat me to it. Thanks for sharing it
This is obvious isn’t it - headless browsers are the best sandbox if you want the features and can afford the weight.
Good time to surface the limitations of a Content Security Policy: https://github.com/w3c/webappsec-csp/issues/92
Also the double iframe technique is important for preventing exfiltration through navigation, but you have to make sure you don't allow top navigation. The outer iframe will prevent the inner iframe from loading something outside of the frame-src origins. This could mean restricting it to only a server which would allow sending it to the server, but if it's your server or a server you trust that might be OK. Or it could mean srcdoc and/or data urls for local-only navigation.
I find the WebAssembly route a lot more likely to be able to produce true sandboxen.
This is the kind of thing that the browser should not need to do. This is the kind of thing that the operating system should be doing. The operating system (the thing you use to run programs securely) should be securing you from bad anything, not just bad native applications.
A large part of the web is awful because of all the things browsers must do that the operating system should already be doing.
We have all tolerated stagnant operating systems for too long.
Plan 9's inherent per-process namespacing has made me angry at the people behind Windows, MacOS, and Linux. If something is a security feature and it's not an inherent part of how applications run, then you have to opt in, and that's not really good enough anymore. Security should be the default. It should be inherent, difficult to turn off for a layman, and it should be provided by the operating system. That's what the operating system is for: to run your programs securely.
The browser being the sandbox isn't a good thing. It's frankly one of the greatest failures of personal computer operating systems.
Can you believe that if you download a calculator app it can delete your $HOME? What kind of idiot designed these systems?
An interesting technique.
The problems discussed by both Simon and Paul where the browser can absolutely trash any directory you give it is perhaps the paradigmatic example where git worktree is useful.
Because you can check out the branch for the browser/AI agent into a worktree, and the only file there that halfway matters is the single file in .git which explains where the worktree comes from.
It's really easy to fix that file up if it gets trashed, and it's really easy to use git to see exactly what the AI did.
[dead]
[dead]
[flagged]
As someone who's been blogging since 2002, I can tell you first hand that you get a fair amount of outreach. But I even though I have had to put Simon's feed through a summarizer to be able to keep up, I don't see any bias there--just _a lot_ of writing about whatever he's interested in, and either our own perceptions of what is interesting and the law of averages inevitably kick in and there are a few duds here and there.
Good opportunities arise for those who stick their neck out. Here's some inspiration for what to blog about: https://simonwillison.net/2022/Nov/6/what-to-blog-about/
It seems he started his blog in 2003: https://simonwillison.net/2003/Jun/12/oneYearOfBlogging/
And ever since Nov 2022 and beyond, his blog is now majority riddled with non-stop AI, LLMs, Chatbots and Agents slop which is what the parent comment is talking about.
As for the "browser is the sandbox" running untrusted code in the user's browser increases the risk of an unintended RCE via a sandbox escape which can be done in Chrome [0]. WASM is not going to save you either [1].
[0] https://www.ox.security/blog/the-aftermath-of-cve-2025-4609-...
[1] https://issues.chromium.org/issues/334120897
He is a familiar blogger for HN readers, has been for a long time. While I agree the posts are nowadays a bit repetitive, he has also very interesting non-AI content. Some people probably upvote because they like the author, not necessarily the content.
I don't understand this criticism. Most agents today are running with no sandboxing at all. Every person has to figure out how they will sandbox each agent (run under bubblewrap? container-use? what about random MCP servers, do they need to be sandboxed separately?) on an ad hoc basis. Most people don't bother with it.
And then you see the recent vulnerabilities in opencode for example. The current model is unsustainable
It would be great if desktop Linux adopted a better security model (maybe inspired by Android). So far we got this https://xkcd.com/1200/ and it's not sufficient
Coding agents may become trivial artifacts to be assembled by developers themselves from libraries, given the well-defined workflow. If it is a homegrown agent then you probably don't need a sandbox to run in.
The browser is the most effective environment to distribute and isolate applications. We have built technologies for years to leverage these capabilities to run legacy Java (CheerpJ) and x86 binaries (Cheerpx / WebVM).
We are soon going to release a new technology, built on top of the same stack, to allow full-stack development completely in the browser. It's called BrowserPod and we think it will be a perfect fit for agents as well.
https://browserpod.io