← Back to context

Comment by eddieroger

2 months ago

I've never hired an assistant, but if I knew that they'd resort to blackmail in the face of losing their job, I wouldn't hire them in the first place. That is acting like a jerk, not like an assistant, and demonstrating self-preservation that is maybe normal in a human but not in an AI.

From the AI’s point of view is it losing its job or losing its “life”? Most of us when faced with death will consider options much more drastic than blackmail.

  • From the LLM's "point of view" it is going to do what characters in the training data were most likely to do.

    I have a lot of issues with the framing of it having a "point of view" at all. It is not consciously doing anything.

  • But the LLM is going to do what its prompt (system prompt + user prompts) says. A human being can reject a task (even if that means losing their life).

    LLMs cannot do other thing than following the combination of prompts that they are given.

> I've never hired an assistant, but if I knew that they'd resort to blackmail in the face of losing their job, I wouldn't hire them in the first place.

How do you screen for that in the hiring process?