Comment by boc

2 months ago

LLMs can research what a tool does before calling it though - they'll sniff that one out pretty quick.

I think the better route is to be honest and say that database integrity is a primary foundation of the company, there's no task worth pursuing that would require touching the database, specifically ask it to think hard before doing anything that gets close to the production data, etc.

I run a much lower-stakes version where an LLM has a key that can delete a valuable product database if it were so inclined. I've built a strong framework around how and when destructive edits can be made (they cannot), but specifically I say that any of these destructive commands (DROP, -rm, etc) need to be handed to the user to implement. Between that framework and claude code via CLI, it's very cautious about running anything that writes to the database, and the new claude plan permissions system is pretty aggressive about reviewing any proposed action, even if I've given it blanket permission otherwise.

I've tested it a few times by telling it to go ahead, "I give you permission", but it still gets stopped by the global claude safety/permissions layer in opus 4.7. IMO it's pretty robust.

Food for thought.

10 comments

boc

not_kurt_godel 2 months ago

> specifically ask it to think hard before doing anything that gets close to the production data

This is recklessly negligent and I would personally not tolerate a coworker or report doing it. What's next, sending long-lived access tokens out over email and asking pretty please for nobody to cc/forward?

boc 2 months ago
As described, there are other failsafes as well. The ultimate being that I keep all code version-controlled, and all databases snapshotted offsite daily/hourly and can rebuild them from a complete delete in fewer than X min.
My broader point is that LLMs are going to need access to these keys whether we like it or not, and until we get extremely scoped API permissions (which would make a ton of sense, but most services aren't there), you have to live a bit on the edge to move quickly.
- not_kurt_godel 2 months ago
  
  > The ultimate being that I keep all code version-controlled, and all databases snapshotted offsite daily/hourly and can rebuild them from a complete delete in fewer than X min.
  Mitigation is good, but what's preventing your sudo-privileged LLM from disabling/corrupting/deleting on-site backups either directly or by proxy via access to the DB and code that writes to it?
  
  3 replies →

kamaal 2 months ago

>>LLMs can research what a tool does before calling it though

Thats stretching the definition of 'research', it basically checks if the texts are close enough.

Delete can occur in various contexts, including safe contexts. It simply checks if a close enough match is available and executes. It doesn't know if what it is doing is safe.

Unfortunately a wide variety of such unsafe behaviours can show up. I'd even say for someone that does things without understanding them. Any write operation of any kind can be deemed unsafe.

EagnaIonat 2 months ago

> specifically ask it to think hard before doing anything that gets close to the production data, etc.

Standard rule is you never let your developers at the production instance. So I can't see why an LLM would get a break.

Jean-Papoulos 2 months ago

"I've put enough safety around the bomb that the bomb is worth using. The other people that exploded just didn't have enough safety but I do !"

boc 2 months ago

More like, I expect this bomb can explode, so I've built contingency plans around it because the cost of not using the tooling is much higher than having downtime for my specific use-case.