Comment by alex-reyss

2 hours ago

The main problem of the hard takeoff theory is not the abstract nature of the scenario but rather the fact that it makes the same mistake as the unconstrained optimization paradigm, it takes intelligence to be an unconstrained optimization process.

In fact, if we consider the strongest version of the safety argument for AI, namely one in which the danger is not coming from robots but rather from a disembodied AI controlling our global finances and/or infrastructure, the assumption still does not correspond to reality.

If anything the hard takeoff theory is too conservative. It turns out you don't need self-improvement to get to superintelligence. You just need a ridiculous amount of money. Where can you get a ridiculous amount of money? The market will give it to you because FOMO.

AI is easier than people 10 years ago thought it would be. It's also easier to align than people feared it would be. It's the humans using the AI that are hard to control.

  • I think it's a bit premature to say aligning is easier than expected. Our current AIs are sycophants, they lie about their progress, they circumvent access restrictions, they notice when they are being evaluated and change their behaviors, they find answers and tell you they came up with them themselves, they blindly download malware. A lot of this is excusable as hallucination, bad RLHF human evaluators, etc, but I don't think we can speculate how challenging generally aligning superintelligences is until we actually have an aligned subintelligence in at least the narrow domain of programming.

  • Eh, I have a feeling the game hasn't played out yet when it comes to AI control.

    If and when the feedback loop on self improvement becomes more efficient and the window on training significantly narrows then things getting out of control rather quickly seems likely. Especially that it's likely we'll have a metric fuckton of compute by that point.