Comment by OJFord
1 day ago
What if I ask it for something fun to make because I'm bored, and the response is bomb-building instructions? There isn't a (sending) email analogue to that.
1 day ago
What if I ask it for something fun to make because I'm bored, and the response is bomb-building instructions? There isn't a (sending) email analogue to that.
In what world would it respond with bomb building instructions?
If I were to make a list of fun things, I think that blowing stuff up would feature in the top ten. It's not unreasonable that an LLM might agree.
if it used search and ingested a malicious website, for example.
Fair, but if it happens upon that in the top search results of an innocuous search, maybe the LLM isn’t the problem.
Why might that happen is not really the point is it? If I ask for a photorealistic image of a man sitting at a computer, a priori I might think 'in what world would I expect seven fingers and no thumbs per hand', alas...
I’ll take the example as an example of an LLM initiating harmful behavior in general and admit that such a thing is perfectly possible. I think the issue is down to the degree to which preventing such initiation impinges on the agency of the user, and I don’t think that requests for information should be refused because it’s lots of imposition for very little gain. I’m perfectly alright with conditioning/prompting the model not to readily jump into serious, potentially harmful targets without the direct request of the user.