← Back to context

Comment by theptip

37 minutes ago

I agree with all these observations.

This is the best argument for successionism IMO. If you can be confident that you are creating a BDFL that is genuinely better than human leaders (a quite low bar) then it seems a good trade, unless you are quite optimistic about humanity’s prospects for improvement.

The problem of course is how to be confident you are creating a good BDFL and not handing control of humanity’s future to an indifferent-at-best, deceptive/malicious at worst successor.

An especially thorny problem - even supposing success on all these difficult alignment problems; supposing Claude Omega really is super-rational / super-moral, and we all vote to make them president of Earth. Things might go great for a while. How would you be confident that a self-modifying agent can retain its values as it grows and re-trains itself?

This is where the LessWrong folks’ explorations into decision theory really come to bear: morality in the face of self-modifying agents becomes very weird. A lot of human moral intuitions break when the principals are able to modify their own code. (See Timeless Decision Theory for an attempt to solve these problems.)

I think the summary is, if you hand control over to a self-modifying AI anything like our current systems, it will go very badly.

Any supposed "AI BDFL" will be controlled by a human. The base concept is inherently flawed.

  • No.. that’s not what AI succession means. This all supposes a powerful and capable enough AI entity that there is no human in control.

    Rather than simply asserting that your interlocutors are wrong, you are welcome to advance an argument for why you think this is the case.

    • The burden of proof is not on us to show why wishful, magical thinking is wrong.

      People who claim that AI will take over can’t point to any evidence. All they have is speculation.