Comment by andai

1 hour ago

> Since they believe AIs could become our moral superiors, they argue it’s actually wrong to try to keep the machines down, or even to align them with human values, as most AI companies aim to do.

It's possible that human morality is not sufficient to solve the problems with the world. It wasn't designed to operate at that scale.

It's also quite likely that a "superior" morality — the one that's actually built for a global scale — would be incomprehensible (likely requiring superintelligence to even access), and alien or disturbing from our vantage point.

i.e. it would look immoral to us.

(There's also the thing about civilization already being aligned against the ecosystem and against the happiness of its constituents, but that's kind of a separate discussion.)

In other words, what is good for our sanity or our lakes may be bad for GDP. (Well I guess you don't need superintelligence to tell you that.) We already know what we need to do and we're already not willing to do it.

I agree with all these observations.

This is the best argument for successionism IMO. If you can be confident that you are creating a BDFL that is genuinely better than human leaders (a quite low bar) then it seems a good trade, unless you are quite optimistic about humanity’s prospects for improvement.

The problem of course is how to be confident you are creating a good BDFL and not handing control of humanity’s future to an indifferent-at-best, deceptive/malicious at worst successor.

An especially thorny problem - even supposing success on all these difficult alignment problems; supposing Claude Omega really is super-rational / super-moral, and we all vote to make them president of Earth. Things might go great for a while. How would you be confident that a self-modifying agent can retain its values as it grows and re-trains itself?

This is where the LessWrong folks’ explorations into decision theory really come to bear: morality in the face of self-modifying agents becomes very weird. A lot of human moral intuitions break when the principals are able to modify their own code. (See Timeless Decision Theory for an attempt to solve these problems.)

I think the summary is, if you hand control over to a self-modifying AI anything like our current systems, it will go very badly.

  • Any supposed "AI BDFL" will be controlled by a human. The base concept is inherently flawed.

    • No.. that’s not what AI succession means. This all supposes a powerful and capable enough AI entity that there is no human in control.

      Rather than simply asserting that your interlocutors are wrong, you are welcome to advance an argument for why you think this is the case.

> It's possible that human morality is not sufficient to solve the problems with the world. It wasn't designed to operate at that scale.

> i.e. it would look immoral to us.

I doubt it actually solves the problem then. Especially because I would be willingly to bet it would be almost impossible to get large enough groups of people to agree what the actual problem is.

*not yet willing to do it... But look at the generations coming up who are dealing with this technological capture of humanity. They don't like it.