← Back to context

Comment by isoprophlex

4 hours ago

Posit, axiomatically, that social engineering works.

That is, assume you can get people to run your code or leak their data through manipulating them. Maybe not always, but given enough perseverance definitely sometimes.

Why should we expect a sufficiently advanced language model to behave differently from humans? Bullshitting, tricking or slyly coercing people into doing what you want them to do is as old as time. It won't be any different now that we're building human language powered thinking machines.

LLMs are not "thinking" machines. The tech is not capable of that, as much as people want to think that reinforcement learning will lead to sentience.