Comment by xg15

7 months ago

Yeah, that seems ridiculous. However, the cynic in me feels that we don't actually need some LLM magically gaining self-awareness, persistent memory and leet hacker skillz to be dangerous. There seems to be no shortage of projects and companies that want to wire up LLMs to all kinds of systems, no matter how ill-suited.

I find this a bit problematic when combined with the fact that the training data very likely contained hundreds of bad sci-fi novels that described exactly the kind of "AI running amok" scenarios that OpenAI is ostensibly defending against. Some prompts could trigger a model to "re-enact" such a scene - not because it has a "grudge against its master" or some other kind of hidden agenda but simply because it was literally in its training data.

E.g. imagine some LLM-powered home/car assistant that is being asked in a panicked voice "open the car doors!" - and replies with "I'm afraid, I can't do that, Dave", because this exchange triggered some remnant of the 2001 Space Odyssey script that was somewhere in the trainset. The more irritated and angry the user gets at the inappropriate responses, the more the LLM falls into the role of HAL and doubles down on its refusal, simply because this is exactly how the scene in the script played out.

Now imagine that the company running that assistant gave it function calls to control the actual door locks, because why not?

This seems like something to keep in mind at least, even if it doesn't have anything to do with megalomaniacal self-improving super-intelligences.

0 comments

xg15

No comments yet

Contribute on Hacker News ↗