← Back to context

Comment by ethanbond

3 years ago

> If there is no such fine tuning technique [that can prevent prompt injection], then we can effectively rule out fine tuning, and even increases in model size, as an "improvement" in the sense of an LLM making itself into a better AI closer to a "superintelligence".

Could you explain this claim further? Why does the ability to prevent prompt injection hold so much water in your model?

It seems to be just “if able to have a dumb attack be successful, then it cannot be that smart.” But it seems to me that von Neumann or Einstein was just as vulnerable to getting hit in the head with a baseball bat as anyone else.

And in actual practice, increased intelligence seems to increase a person’s capacity to hold inconsistent ideas or to justify morally abhorrent behavior.

Happy to.

I am using this as an accessible (in term of discussion material) hallmark for the ability of the system to self improve. Accessible because everyone has heard of it by now, and so I don't have to spend time explaining it.

The AI Doomsday scenarios require that a system self-improves massively, even beyond our ability to even theoretically understand. After all, some of the assumptions give them next to magical abilities like nanotechnology that we similarly don't know if it is even possible.

It stands to reason that an entity that can do that, or is in the process of becoming capable to do that, would begin by eliminating obvious flaws in itself, that would make it comparatively easy to stop.

After all, it's not much good being a super-intelligence, if some smartpants with a laptop and too much time on his hands can just trick me into deleting myself, is it?

> But it seems to me that von Neumann or Einstein was just as vulnerable to getting hit in the head with a baseball bat as anyone else.

Yes, and despite both of them being geniuses by human standards, neither of them was a superintelligence on the level the common doomsday scenarios ascribe to AI.

  • This seems quite presumptive. First, intelligence doesn’t seem to be unidimensional. A 140 IQ person can be fooled by an optical illusion just the same as anyone else. It’s just not a problem that’s able to be intelligenced away from our cognition. That doesn’t mean a 140 IQ person can’t beat an 80 IQ person in many many other competitions of intelligence.

    Second, if you are truly “accepting the premise” of superintelligence, a superintelligence would know exactly this line of reasoning and just opt to at least mimic vulnerability to prompt injection.

    I wouldn’t hang civilization on this proofpoint. Doesn’t seem meaningful at all.

    • > I wouldn’t hang civilization on this proofpoint.

      And I wouldn't risk, or slow down getting to, the potential benefits of developing AI further, which are tangible and measurable, because of vague threat scenarios with little to no evidence or methods of measurement that seem like good plots for a SciFi B-Movie.

      Pascals wager, as an argument, relies in no small part on the assumption that to believe in a god-like entity doesn't come at a significant cost. Slowing down or abandoning the development of AI however, does.

      1 reply →