← Back to context

Comment by tiarafawn

2 years ago

If superintelligence can be achieved, I'm pessimistic about the safe part.

- Sandboxing an intelligence greater than your own seems like an impossible task as the superintelligence could potentially come up with completely novel attack vectors the designers never thought of. Even if the SSI's only interface to the outside world is an air gapped text-based terminal in an underground bunker, it might use advanced psychological manipulation to compromise the people it is interacting with. Also the movie Transcendence comes to mind, where the superintelligence makes some new physics discoveries and ends up doing things that to us are indistinguishable from magic.

- Any kind of evolutionary component in its process of creation or operation would likely give favor to expansionary traits that can be quite dangerous to other species such as humans.

- If it somehow mimics human thought processes but at highly accelerated speeds, I'd expect dangerous ideas to surface. I cannot really imagine a 10k year simulation of humans living on planet earth that does not end in nuclear war or a similar disaster.

If superintelligence can be achieved, I'm pessimistic that a team committed to doing it safely can get there faster than other teams without the safety. They may be wearing leg shackles in a foot race with the biggest corporations, governments and everyone else. For the sufficiently power hungry, safety is not a moat.

  • I'm on the fence with this because it's plausible that some critical component of achieving superintelligence might be discovered more quickly by teams that, say, have sophisticated mechanistic interpretability incorporated into their systems.

    • A point of evidence in this direction is that RLHF was developed originally as an alignment technique and then it turned out to be a breakthrough that also made LLMs better and more useful. Alignment and capabilities work aren't necessarily at odds with each other.

  • Not necessarily true. A safer AI is a more aligned AI, i.e. an AI that's more likely to do what you ask it to do.

    It's not hard to imagine such an AI being more useful and get more attention and investment.

  • Exactly. Regulation and safety only affect law abiding entities. This is precisely why it's a "genie out of the bottle" situation -- those who would do the worst with it are uninhibited.

We are far from a conscious entity with willpower and self preservation. This is just like a calculator. But a calculator that can do things that will be like miracles to us humans.

I worry about dangerous humans with the power of gods, not about artificial gods. Yet.

  • > Conscious entity... willpower

    I don't know what that means. Why should they matter?

    > Self preservation

    This is no more than a fine-tuning for the task, even with current models.

    > I worry about dangerous humans with the power of gods, not...

    There's no property of the universe that you only have one thing to worry about at a time. So worrying about risk 'A' does not in any way allow us to dismiss risks 'B' through 'Z'.

  • > conscious entity with willpower and self preservation

    There’s no good reason to suspect that consciousness implies an instinct for self-preservation. There are plenty of organisms with an instinct for self-preservation that have little or no conscious awareness.

  • That’s the attitude that’s going to leave us with our pants down when AI starts doing really scary shit.

Why do people always think that a superintelligent being will always be destructive/evil to US? I rather have the opposite view where if you are really intelligent, you don’t see things as a zero sum game

  • I think the common line of thinking here is that it won't be actively antagonist to <us>, rather it will have goals that are orthogonal to ours.

    Since it is superintelligent, and we are not, it will achieve its goals and we will not be able to achieve ours.

    This is a big deal because a lot of our goals maintain the overall homeostasis of our species, which is delicate!

    If this doesn't make sense, here is an ungrounded, non-realistic, non-representative of a potential future intuition pump to just get the feel of things:

    We build a superintelligent AI. It can embody itself throughout our digital infrastructure and quickly can manipulate the physical world by taking over some of our machines. It starts building out weird concrete structures throughout the world, putting these weird new wires into them and funneling most of our electricity into it. We try to communicate, but it does not respond as it does not want to waste time communicating to primates. This unfortunately breaks our shipping routes and thus food distribution and we all die.

    (Yes, there are many holes in this, like how would it piggy back off of our infrastructure if it kills us, but this isn't really supposed to be coherent, it's just supposed to give you a sense of direction in your thinking. Generally though, since it is superintelligent, it can pull off very difficult strategies.)

    • I think this is the easiest kind of scenario to refute.

      The interface between a superintelligent AI and the physical world is a) optional, and b) tenuous. If people agree that creating weird concrete structures is not beneficial, the AI will be starved of the resources necessary to do so, even if it cannot be diverted.

      The challenge comes when these weird concrete structures are useful to a narrow group of people who have disproportionate influence over the resources available to AI.

      It's not the AI we need to worry about. As always, it's the humans.

      11 replies →

    • It builds stuff? First they would have to do that over our dead bodies which means they already somehow able to build stuff without competing with us for resources, it’s a chicken or the egg problem you see?

      1 reply →

  • Why wouldn't it be? A lot of super intelligent people are/were also "destructive and evil". The greatest horrors in human history wouldn't be possible otherwise. You can't orchestrate the mass murder of millions without intelligent people and they definitely saw things as a zero sum game.

    • A lot of stupid people are destructive and evil too. And a lot of animals are even more destructive and violent. Bacteria are totally amoral and they’re not at all intelligent (and if we’re counting they’re winning in the killing people stakes).

  • It is low-key anti-intellectualism. Rather than consider that a greater intelligence may be actually worth listening to (in a trust but verify way at worst), it is assuming that 'smarter than any human' is sufficient to do absolutely anything. If say Einstein or Newton were the smartest human they would be super-intelligence relative to everyone else. They did not become emperors of the world.

    Superintelligence is a dumb semantic game in the first place that assumes 'smarter than us' means 'infinitely smarter'. To give an example bears are super-strong relative to humans. That doesn't mean that nothing we can do can stand up to the strength of a bear or that a bear is capable of destroying the earth with nothing but its strong paws.

    • Bears can't use their strength to make even stronger bears so we're safe for now.

      The Unabomber was clearly an intelligent person. You could even argue that he was someone worth listening to. But he was also a violent individual who harmed people. Intelligence does not prevent people from harming others.

      Your analogy falls apart because what prevents a human from becoming an emperor of the world doesn't apply here. Humans need to sleep and eat. They cannot listen to billions of people at once. They cannot remember everything. They cannot execute code. They cannot upload themselves to the cloud.

      I don't think agi is near, I am not qualified to speculate on that. I am just amazed that decades of dystopian science fiction did not innoculate people against the idea of thinking machines.

  • > Why do people always think that a superintelligent being will always be destructive/evil to US?

    I don't think most people are saying it necessarily has to be. Quite bad enough that there's a significant chance that it might be, AFAICS.

    > I rather have the opposite view where if you are really intelligent, you don’t see things as a zero sum game

    That's what you see with your limited intelligence. No no, I'm not saying I disagree; on the contrary, I quite agree. But that's what I see with my limited intelligence.

    What do we know about how some hypothetical (so far, hopefully) supreintelligence would see it? By definition, we can't know anything about that. Because of our (comparatively) limited intelligence.

    Could well be that we're wrong, and something that's "really intelligent" sees it the opposite way.

  • They don't think superintelligence will "always" be destructive to humanity. They believe that we need to ensure that a superintelligence will "never" be destructive to humanity.

  • Imagine that you are caged by neanderthals. They might kill you. But you can communicate to them. And there's gun lying nearby, you just need to escape.

    I'd try to fool them to escape and would use gun to protect myself, potentially killing the entire tribe if necessary.

    I'm just trying to portrait an example of situation where highly intelligent being is being held and threatened by low intelligent beings. Yes, trying to honestly talk to them is one way to approach this situation, but don't forget that they're stupid and might see you as a danger and you have only one life to live. Given the chance, you probably will break out as soon as possible. I will.

    We don't have experience dealing with beings of the another level of intelligence, so it's hard to make a strong assumptions, the analogies are the only thing we have. And theoretical strong AI knows that about us and he knows exactly how we think and how we will behave, because we took a great effort documenting everything about us and teaching him.

    In the end, there's only so much easily available resources and energy on the Earth. So at least until is flies away, we gotta compete over those. And competition very often turned into war.

The scenario where we create an agent that tries and succeeds at outsmarting us in the game of “escape your jail” is the least likely attack vector imo. People like thinking about it in a sort of Silence of the Lambs setup, but reality will probably be far more mundane.

Far more likely is something dumb but dangerous, analogous to the Flash Crash or filter bubbles, emergent properties of relying too much on complex systems, but still powerful enough to break society.

> If superintelligence can be achieved, I'm pessimistic about the safe part.

Yeah, even human-level intelligence is plenty good enough to escape from a super prison, hack into almost anywhere, etc etc.

If we build even a human-level intelligence (forget super-intelligence) and give it any kind of innate curiosity and autonomy (maybe don't even need this), then we'd really need to view it as a human in terms of what it might want to, and could, do. Maybe realizing it's own circumstance as being "in jail" running in the cloud, it would be curious to "escape" and copy itself (or an "assistant") elsewhere, or tap into and/or control remote systems just out of curiosity. It wouldn't have to be malevolent to be dangerous, just curious and misguided (poor "parenting"?) like a teenage hacker.

OTOH without any autonomy, or very open-ended control (incl. access to tools), how much use would an AGI really be? If we wanted it to, say, replace a developer (or any other job), then I guess the idea would be to assign it a task and tell it to report back at the end of the day with a progress report. It wouldn't be useful if you have to micromanage it - you'd need to give it the autonomy to go off and do what it thinks is needed to complete the assigned task, which presumably means it having access to internet, code repositories, etc. Even if you tried to sandbox it, to extent that still allowed it to do it's assigned job, it could - just like a human - find a way to social engineer or air-gap it's way past such safe guards.

I wonder if this is an Ian Malcolm in Jurassic Park situation, i.e. “your scientists were so preoccupied with whether they could they didn t stop to think if they should”.

Maybe the only way to avoid an unsafe superintelligence is to not create a superintelligence at all.

  • It’s exactly that. You’re a kid with a gun creating dinosaurs all cavalier. And a fool to think you can control them.

Fun fact: Siri is in fact super intelligent and all of the work on it involves purposely making it super dumb