Comment by troupo

7 hours ago

Power Shell or Python scripts to work around restrictions are the go to for LLMs.

And it doesn't stop there.

Yesterday I was trying to figure out some icons issue in KDE plasma (I know nothing about KDE). Both Claude and Codex would run complex bus and debug queries and write and execute QML scripts with more and more tools thrown into the mix.

There's no way to properly block them with just allow- and block lists

11 comments

troupo

embedding-shape 4 hours ago

> There's no way to properly block them with just allow- and block lists

Especially not when some harnesses rely on the reliability of the LLM to determine what's allowed or not, pretty much "You shouldn't do thing X" and then asking the LLM to itself evaluate if it should be able to do it or not when it comes up. Bananas.

Only right and productive way to run an agent on your computer is by isolating it properly somehow then running it with "--sandbox danger-full-access --dangerously-bypass-approvals-and-sandbox" or whatever, I myself use docker containers, but there are lots of solutions out there.

felixyz 4 hours ago
You have to be extremely careful when you set up a dev container, lock down file access, do not give the agent the power to start other containers or "docker compose up", restrict network access to an allow-list etc. Just running the agent in a container does little to protect you. (Maybe you know this, but a lot of people don't!)
- embedding-shape 4 hours ago
  
  Most of those things are what happens by default. Sure, be careful, but by default it's secure enough to prevent most potential issues. No need to lock down file access for example, by default it only has access to files inside the container, and of course by default containers don't have access to start other containers, and so on.
  Good word of caution though, make sure you actually isolate when you set out to isolate something :)
  
  2 replies →

ebonnafoux 7 hours ago

In a previous employer, they block the chmod command. I took the habit to python -c "import os; os.chmod('my_file',744)".

Glad to see LLM re-discover this trick.

Terr_ 6 hours ago
> to see LLM re-discover
I imagine someone probably wrote very specifically about it in the training data that underwent lossy compression, and the LLM is decompressing that how-to.
So I'd say it's more like "surfacing" or "retrieving" than "re-discovering".
- seanp2k2 6 hours ago
  
  They scraped everything on Stackoverflow, likely IRC logs from Freenode, and every book written in the modern era courtesy of Sci-Hub / Library Genesis / Anna's Archive / Z Library.
  RIP Aaron Swartz, they're generating trillions in shareholder value from the spiritual successors to the work they were going to imprison you for.
- ebonnafoux 2 hours ago
  
  Indeed, I check and the solution was already on stack overflow https://askubuntu.com/a/1483248
andyhedges 6 hours ago
For the LLM it's a probabilistic set of strings that achieves the outcome, the highest probability set didn't work, try the next one until success or threshold met. A human sees the implicit difference between the obvious thing not working indicating someone doesn't want you to do it, but an LLM unless guided doesn't seen that sub-text.
So chmod +x file didn't work, now try python -c "import os; os.chmod('file',744)"
- sigmoid10 5 hours ago
  
  Humans and LLMs both only see that when given the right context. A tool not working in a corporate environment may be anything from oversight, malfunction all the way to security block. Knowing which one it is takes a lot of implicit knowledge. Most people fail to provide this level of context to their LLMs and then wonder why they act so generic. But they are trained to act in the most generic way unless given context that would deviate from it.