Comment by ben_w

2 years ago

Mm.

1. Depends what you mean by AGI, as everyone means a different thing by each letter, and many people mean a thing not in any of those letters. If you mean super-human skill level, I would agree, not enough examples given their inefficiency in that specific metric. Transformers are already super-human in breadth and speed.

2. No.

Alignment is not at that level of abstraction.

Dig deep enough and free will is an illusion in us and in any AI we create.

You do not have the capacity to decide your values — often given example is parents loving their children, they can't just decide not to do that, and if they think they do that's because they never really did in the first place.

Alignment of an AI with our values can be to any degree, but for those who fear some AI will cause our extinction, this question is at the level of "how do we make sure it's not monomaniacally interested in specifically the literal the thing it was asked to do, because if it always does what it's told without any human values, and someone asks it to make as many paperclips as possible, it will".

Right now, the best guess anyone has for alignment is RLHF. RLHF is not a lobotomy — even ignoring how wildly misleading that metaphor is, RLHF is where the capability for instruction following came from, and the only reason LLMs got good enough for these kinds of discussion (unlike, say, LSTMs).

3. Agree that getting paperclipped much more likely.

Roko's Basilisk was always stupid.

First, same reason as Pascal's Wager: Two gods tell you they are the one true god, and each says if you follow the other one you will get eternal punishment. No way to tell them apart.

Second, you're only in danger if they are actually created, so successfully preventing that creation is obviously better than creating it out of a fear that it will punish you if you try and fail to stop it.

That said, LLMs do understand lying, so I don't know why you mention this?

4. Transistors outpace biological synapses by the same ratio to which marathon runners outpace continental drift.

I don't monitor my individual neurons, but I could if I wanted to pay for the relevant hardware.

But even if I couldn't, there's no "Ergo" leading to safety from reasonable passwords, cert rotations, etc., not only because enough things can be violated by zero-days (or, indeed, very old bugs we knew about years ago but which someone forgot to patch), but also for the same reasons those don't stop humans rising from "failed at art" to "world famous dictator".

Air-gapped systems are not an impediment to an AI that has human helpers, and there will be many of those, some of whom will know they're following an AI and think that helping it is the right thing to do (Blake Lemoine), others may be fooled. We are going to have actual cults form over AI, and there will be a Jim Jones who hooks some model up to some robots to force everyone to drink poison. No matter how it happens, air gaps don't do much good when someone gives the thing a body to walk around in.

But even if air gaps were sufficient, just look at how humanity has been engaging with AI to date: the moment it was remotely good enough, the AI got a publicly accessible API; the moment it got famous, someone put it in a loop and asked it to try to destroy the world; it came with a warning message saying not to trust it, and lawyers got reprimanded for trusting it instead of double-checking its output.

0 comments

ben_w

No comments yet

Contribute on Hacker News ↗