Comment by devolving-dev

20 days ago

Don't you have the same issue when you hire an employee and give them access to your systems? If the AI seems capable of avoiding harm and motivated to avoid harm, then the risk of giving it access is probably not greater than the expected benefit. Employees are also trying to maximize paperclips in a sense, they want to make as much money as possible. So in that sense it seems that AI is actually more aligned with my goals than a potential employee.

7 comments

devolving-dev

johndough 20 days ago

I do not believe that LLMs fear punishment like human employees do.

devolving-dev 20 days ago
Whether driven by fear or by their model weights or whatever, I don't think that the likelihood of an AI agent, at least the current ones like Claude and Codex, acting maliciously to harm my systems is much different than the risk of a human employee doing so. And I think this is the philosophical difference between those who embrace the agents, they view them as akin to humans, while those who sandbox them view them as akin to computer viruses that you study within a sandbox. It seems to me that the human analogy is more accurate, but I can see arguments for the other position.
- snowmobile 19 days ago
  
  Sure, current agents are harmless, but that's due to their low capability, not due to their alignment with human goals. Can you explain why you'd view them as more similar to humans than to computer viruses?
  
  2 replies →

pluralmonad 19 days ago

Its been pretty well documented that LLMs can be social engineered as easily as a toddler. Equating the risk to that of a human employee seems wrong. I'm sure the safeguards will improve, but for now the risk is still there.

snowmobile 19 days ago

An AI has no concept of human life nor any morals. Sure, it may "act" like it, but trying to reason about its "motivations" is like reasoning about the motivations of smallpox. Humans want to make money, but most people only want that in order provide a stable life for their family. And they certainly wouldn't commit mass murder for a billion dollars, while an AGI is capable of that.

> So in that sense it seems that AI is actually more aligned with my goals than a potential employee.

It may seem like that but I recommend you reading up on different kinda of misalignment in AI safety.