Comment by not_kurt_godel

2 months ago

> specifically ask it to think hard before doing anything that gets close to the production data

This is recklessly negligent and I would personally not tolerate a coworker or report doing it. What's next, sending long-lived access tokens out over email and asking pretty please for nobody to cc/forward?

5 comments

not_kurt_godel

boc 2 months ago

As described, there are other failsafes as well. The ultimate being that I keep all code version-controlled, and all databases snapshotted offsite daily/hourly and can rebuild them from a complete delete in fewer than X min.

My broader point is that LLMs are going to need access to these keys whether we like it or not, and until we get extremely scoped API permissions (which would make a ton of sense, but most services aren't there), you have to live a bit on the edge to move quickly.

not_kurt_godel 2 months ago
> The ultimate being that I keep all code version-controlled, and all databases snapshotted offsite daily/hourly and can rebuild them from a complete delete in fewer than X min.
Mitigation is good, but what's preventing your sudo-privileged LLM from disabling/corrupting/deleting on-site backups either directly or by proxy via access to the DB and code that writes to it?
- boc 2 months ago
  
  It's a good question. I think it's similar to the question about an employee having sensitive access, and whether they'll get blackout drunk one night and delete everything. Or they get spearfished and get owned (prob more likely).
  In the future, I could see this solved by the same "nuclear launch key" style delegation of keys. Aka in order to run certain API or database commands, the service requires both the standard dev key (presumably used by the LLM) and a separate "human admin key" that gets requested whenever a specific operation is requested. It could be tied to a biometric request or something as well to avoid the LLM hacking its way around it. Honestly this is pretty out of my technical depth but just thinking out-loud.
  
  2 replies →