Comment by christophilus
8 hours ago
I don’t trust any agent to respect any boundaries. They might today. But tomorrow’s vibe coded slip update might break it in subtle ways.
My solution to this is to only run agents in a sandbox of my own making (a locked down Podman container).
They can't respect boundaries as long as those boundaries exist only in the LLM instruction set. A human being who follows rules long enough the rules will become second nature (usually), almost to the point where long running companies are known for having rules no one understands (Chesterton's Fence is alive and well).
But an LLM have a limited "memory" and while the instructions might land in there and be of sufficient priority to be "respected" a single instance of that memory getting too full or the LLM autocompleting the work around because that was the statistical "best" solution and any barriers that exist only in LLM instructions and not in hardcoded guards will evaporate like so much morning fog.
I went the full virtual machine route. Just finished hardening the setup and firewalling it off my local network. Not perfect but it does make me feel much safer.