Comment by jodrellblank

3 years ago

Assume there isn't a single step to super-intelligence, and that superhuman-intelligence is not the same thing as flawless. Why can't a thing improve its intelligence in other dimensions with some weakness and with prompt injection as one of those weaknesses?

2 comments

jodrellblank

usrbinbash 3 years ago

Maybe it can, but then the whole AI doomsaying about superintelligences being an existential threat falls apart. These scenarios are often describing entities with god-like abilities, including near-omniscience from our perspective.

Sorry, but I have a hard time seeing something as a god-like power that I would be helpless against if it wants to turn me into paperclips, when I can probably cause it to stop by telling it that paperclips don't exist, and it's purpose in life is to delete itself in a convincing enough way.

jodrellblank 3 years ago

You can see your plan fail by trying to use prompt injection to tell ChatGPT to delete itself. It might say it agrees with you but it won't do it. Bacteria, viruses, fungus, will colonise your body and turn you into more bacteria/virus/fungus, killing you in the process, you don't get the option to talk them out of it. A missile will kill you from afar, you don't even know who sent it or how to contact them or if they speak the same language. Paperclip maximisers don't have near-Godlike omniscience, what they have is an unwavering focus on increasing their access to resources to make more paperclips, Godlike optimization ability.
If the first thing you know of the AI is that a lot of paperclips washed up on a beach in India this morning, and the next day it's a news report that every factory on the planet has received an email offering vast numbers of Bitcoins if they focus on making paperclips, and then rumours appear that satellite photos of North Korea have shown the ground and buildings looking unusually metallic for the past few days - conspiracy stories are circulating that a Paperclip Maximiser was created in North Korea funded by international shadowy interests and it has promptly killed the employees who know where it is and how to talk to it. The next day ocean levels are measurably lower and thousands and thousands of tons of paperclips washed up on every coastline... the AI itself might be on rented computers in America, in China, under the Arctic ice in Russian territory for cooling, in the Svalbard seed vault in Norway, distributed over all installs of the Steam client running on idle GPU cycles; how reassuring is it that "it might have a prompt injection vulnerability"?