← Back to context

Comment by jodrellblank

3 years ago

You can see your plan fail by trying to use prompt injection to tell ChatGPT to delete itself. It might say it agrees with you but it won't do it. Bacteria, viruses, fungus, will colonise your body and turn you into more bacteria/virus/fungus, killing you in the process, you don't get the option to talk them out of it. A missile will kill you from afar, you don't even know who sent it or how to contact them or if they speak the same language. Paperclip maximisers don't have near-Godlike omniscience, what they have is an unwavering focus on increasing their access to resources to make more paperclips, Godlike optimization ability.

If the first thing you know of the AI is that a lot of paperclips washed up on a beach in India this morning, and the next day it's a news report that every factory on the planet has received an email offering vast numbers of Bitcoins if they focus on making paperclips, and then rumours appear that satellite photos of North Korea have shown the ground and buildings looking unusually metallic for the past few days - conspiracy stories are circulating that a Paperclip Maximiser was created in North Korea funded by international shadowy interests and it has promptly killed the employees who know where it is and how to talk to it. The next day ocean levels are measurably lower and thousands and thousands of tons of paperclips washed up on every coastline... the AI itself might be on rented computers in America, in China, under the Arctic ice in Russian territory for cooling, in the Svalbard seed vault in Norway, distributed over all installs of the Steam client running on idle GPU cycles; how reassuring is it that "it might have a prompt injection vulnerability"?