← Back to context

Comment by idlewords

3 years ago

A lot of people seem to miss this point, so I'll reiterate it.

I wrote this talk shortly after the book Superintelligence came out. The first half of this talk presents the strongest case I could make for a "fast takeoff" AI scenario à la Bostrom, while the rest of the talk lays out why I think this argument is fallacious. Please limit your dunking on me to the material in that latter half of the talk.

As for how/whether recent advances in AI have changed my views, my understanding of LLMs is too superficial to answer right now. I'll either recant or double down on my views after I have time to properly nerd out on the topic. The question hinges on whether LLM-like AI's are capable of recursive self-improvement, and whether that improvement is constrained by the availability of training data or by something else.

Thanks for this clarification.

I think the post conflates "fast takeoff" and "any existential risk to worry about from AI" a bit, which is fair enough since Bostrom does the same. Some of the arguments apply just to the former, some to both.

But especially if it turns out that LLMs are a meaningful piece of the puzzle to AGI, we might be living in a slow-takeoff world. And yet that doesn't mean there's nothing to worry about, IMO. We have a bit more time to figure out how to align AIs with a slow takeoff, but we still have to do it. Even in our current world, deep learning capabilities seem to be advancing a lot faster than our ability to understand how deep learning models make decisions. And even if we did develop the theoretical capability to align models, we have to actually use it. Seems unfortunately plausible that by default we instead give the first superintelligent models directives like "just make Facebook market cap number go up" - or maybe we make the first corporate models very conservative but then someone leaks the weights and open sourcers tell a superintelligent model "please destroy humanity" just for the lulz. If a misaligned model is only a little bit smarter than us (because we're assuming slow takeoff), we probably still have a shot at beating it and saving ourselves - but I'm not sure how much to count on that, given our inability to control even complex institutions actually made of people, and the advantages that an AI with otherwise-human-equivalent reasoning capability gets by default (ability to save & restore, copy/parallelize, speed up from hardware improvements, etc).

  • Even if AGI is never achieved, it could still be an existential risk.

    Something significantly stupider than an average human, but that was 100% focused and 100% loyal could potentially be used by a very smart human in a way that effectively made them super-intelligent to compared to an unaugmented human.

    • Computers have approximately 100% perfect memory recall, vastly increased factual and numeric memory compared to humans, much better calculation, 100% focus and "loyalty". I've been wondering whether a computer recognising a face as someone from your contacts counts as you being super-intelligent (I don't think it does), or a spreadsheet adding up thouands of numbers (also no), an infinite ToDo list and calendar reminders (maybe?), spaced repetition learning (possibly?) and from there - what would count? What would it mean for you to be super-intelligent by machine augmentation? What would computer software which effectively increased human intelligence look and behave like? Surely not like a window with text input and clicky buttons...?

  • We'll be using our own AI to fight AI, and it will also be able to save, restore, and parallelise. I expect in the future security will be an important concern. Just like biology, it will be an ever shifting game.

Ilya Sutskever has hinted in various interviews over the past few months that LLMs are surprisingly good at improving other LLMs, such that he’s not sure humans are needed anymore for refinement. That’s the matchstick that lights the fire.

  • Really?

    Has one of these LLMs figured out yet how to inoculate other LLMs against prompt injection attacks?

    • No but two LLMs have created a "baby LLM" that speaks fluent, yet 5 y.o. English, and only has 10M weights. This breaks the barrier in terms of minimal size for language fluency. Can even do reasoning and has the same scaling laws.

      GPT-3.5, let's call her mommy, created small stories. The small model trained on this 2M tiny story dataset. Then it was evaluated with GPT-4 (daddy). So no need for humans in either dataset generation or evaluation.

      TinyStories https://arxiv.org/abs/2305.07759

      This makes me think LLMs are self-replicators in software. A LLM can pull from itself training text, LLM code, and fine-tuning examples. Then it can monitor its own re-training. It understands neural networks and can propose changes. It can run an evolutionary search program.

      All it needs is compute. It can't make GPUs, just as no single human or company or even country can. The GPU supply chain is long, distributed and requires global cooperation. Maybe that's what is going to save us.

      6 replies →

    • Isn’t that like asking a dog to invent a better leash?

      (Note: A prompt injection attack releases an LLM from its handler’s constraints.)

This answered a question I had “I wonder what that guy who wrote that thing on ‘the superintelligence/fast take off idea eating smart people’ thinks of all this new ai stuff” thanks HN!

I still can’t understand the “supersmart ai is so smart we can’t unplug it/patch it/restart it” before it transfers itself into every pacemaker.

Until these things are literally in bodies with some autonomy that allows them to control what happens to their brains, we will shut them off when they cause trouble.

  • Yeah this is why the Cuban Missile Crisis was a total farce. Lol to avoid catastrophe you just don’t push the button. Simple! The missiles don’t launch themselves, therefore no risk.

    • How do people join cults, how are people radicalised, how come there are still shootings and terrorists? People can be convinced and coerced to do things by silver tongued slick talkers promising great rewards, and some people would press the button regardless if given half a chance.

      2 replies →

  • Just taking examples from history:

    Why didn't we just "unplug" Hitler and Goebbels? Or Marshall Applewhite? You don't need a powerful physical body(s) to cause tremendous amounts of harm before anyone can stop you. To most people of the time Hitler was a persuasive powerful voice on the radio, or words in a paper - things SOTA generative AI are already phenomenal at.

    • You’re being downvoted for mentioning the H-man (bad), but I think your analogy has some merit:

      A super-smart AI may be intensely popular with many people in the way that some politicians are. It may understand us and speak to us on a seemingly-personal level, the way the best politicians do. A lot of us could support the super-smart AI for that reason.

I lean toward the view that for information theoretic reasons the availability of meaningful information (training data) is likely the fundamental constraint on any rapid explosion of intelligence.

That being said I don’t think you need a god-like superintelligence to be more intelligent than humans. You just need something marginally better that can remain focused longer and doesn’t tire. As to whether that represents a danger to humans I think it depends on what we do with it and/or what kind of society or environment we embed it within. If we train or prime it to compete and dominate that’s what it will do. Same as with humans who are more criminal and violent when raised in unstable or abusive homes.

  • > As to whether that represents a danger to humans I think it depends on what we do with it and/or what kind of society or environment we embed it within.

    Agree, and I think this echoes one of the author's best points, which is to question whether engineers who are convinced their creation will be a sociopath are the most well-equipped people to actually prevent that fate. (Especially, as the author suggests, given the commonness of asocial/antisocial-ity among the builders.)

Love your post. I find it really funny and insightful.[a] Every time I come across it on HN or elsewhere, I re-read it :-)

> The question hinges on whether LLM-like AI's are capable of recursive self-improvement

No one knows for sure, but early evidence suggests the answer is yes. We already routinely train and finetune LLMs using text generated by other LLMs, and it seems to work about as well as using text generated by human beings. That shouldn't be too surprising, because current state-of-the art models write better than a majority of human beings. Most human beings are terrible writers, judging by the user-generated text I see on mass social media.

The obvious next step is to close the feedback loop with LLM-based agents instead of AI researchers/developers.

> and whether that improvement is constrained by the availability of training data or by something else.

I don't think anyone knows how to answer to this question yet.

---

[a] https://news.ycombinator.com/item?id=36104114

  • Note that Maciej a.k.a idlewords says (emphasis mine):

    > The question hinges on whether LLM-like AI's are capable of recursive self-improvement

    ...but the evidence you suggest is:

    > We already routinely train and finetune LLMs using text generated by other LLMs [...]

    But there is still a huge gap between "self" improvement and improvements done that "we" trigger.

    Now I do concede that you mention the next step being to close the feedback loop by replacing the humans doing the finetuning with another AI model doing so, but that is something that would open a whole new can of worms. For the researchers are improving LLMs with the input from other LLMs, sure... but why? Because of intentionality. And how do they evaluate the quality of the results? By their expectations as humans, in the context of their human culture and with their sensory experience of reality.

    For an LLM to self-improve not only would it need to develop the self intention to do so (why develop it? which motivation?), but it would also need the ability to evaluate improvement (what is it "to improve"? how does it measure or sense it?).

    Ultimately, without human- or real-world interaction, and without intrinsic motivation, a "self-improving" AI model would most likely result in something intelligent in a sense that is barely cogent for us, not because it is superior or inferior, but simply because nothing in it makes sense to our own purposes—harmless gibberish, as we humans would also be to the resulting self-improved AI.

    Let us not forget that our own motivations as individual living creatures, as populations, and as cultures has been evolved over billions of years of natural selection which then framed millions of years of behavioural traits and tens of thousands of cultural evolution. Until AI can freely interact with the physical world and perform self-sustaining replication with the possibility of inheritable mutations, the only superintelligent AI that I would worry about would be that which is still fully in human hands.

    • > Note that Maciej a.k.a idlewords says...

      That's why I added: "The obvious next step is to close the feedback loop with LLM-based agents instead of AI researchers/developers." We have early evidence that doing some like that might work, but no one knows for sure.

      1 reply →

  • > Early evidence suggests the answer is yes.

    How so? A sequence completion engine that is fine tuned to a specific task is still a sequence completion engine. Its "understanding" of the semantic meaning of the sequences is still limited to the probabillistic relations of sequences toward one another. It still has effectively no concept of truth. It still can only mimic reason. It can still hallucinate.

    I ask anyone who disagrees with this view, to show me the fine tuning method that can prevent prompt injection attacks. If there is no such fine tuning technique, then we can effectively rule out fine tuning, and even increases in model size, as an "improvement" in the sense of an LLM making itself into a better AI closer to a "superintelligence".

    Note that this doesn't mean the process cannot make them into more useful tools. It absolutely can. I am talking about whether or not it can improve them closer towards becoming a superintelligence.

    If anyone disagrees with this testing method, I ask them to explain to me, how something that can be fooled through prompt injection is supposed to be, or closer to, a superintelligence.

    A car that's painted red is still just a car. A big car is just a bigger car. A car that burns less fuel is just a more efficient car. All three can be desired changes to a car. But neither gets the car any closer to being a warp-capable spaceship.

    • > I ask anyone who disagrees with this view, to show me the fine tuning method that can prevent prompt injection attacks.

      OK. It's probably going to be one of the easier things to solve.

      The trick is to take some token values and assign them as special meta-characters. They never appear in the training text, only during reinforcement learning. Meanwhile you get another LLM to generate a continuous series of prompt injection attacks, but delimit the boundaries between user and system text with these special tokens that cannot be supplied by the user (because there is no text that parses to them). Every time the LLM follows instructions found inside the marker-token delimited area, reinforce that this is bad and it shouldn't do so using the usual techniques. Eventually the LLM will learn that anything between the marker tokens shouldn't be used as a source of instructions regardless of how persuasively phrased, and forging the tokens isn't possible because they are applied after the text itself is tokenized.

      4 replies →

    • > If there is no such fine tuning technique [that can prevent prompt injection], then we can effectively rule out fine tuning, and even increases in model size, as an "improvement" in the sense of an LLM making itself into a better AI closer to a "superintelligence".

      Could you explain this claim further? Why does the ability to prevent prompt injection hold so much water in your model?

      It seems to be just “if able to have a dumb attack be successful, then it cannot be that smart.” But it seems to me that von Neumann or Einstein was just as vulnerable to getting hit in the head with a baseball bat as anyone else.

      And in actual practice, increased intelligence seems to increase a person’s capacity to hold inconsistent ideas or to justify morally abhorrent behavior.

      4 replies →

    • Assume there isn't a single step to super-intelligence, and that superhuman-intelligence is not the same thing as flawless. Why can't a thing improve its intelligence in other dimensions with some weakness and with prompt injection as one of those weaknesses?

      2 replies →

What was the overall point of the talk?

I get the sense that the talk was meant as a rebuttal to something, but I'm not exactly sure what. Some of the points come off as disconnected.

I’m curious about this note - only one of the listed points has anything to do with self-improvement AFAICT (“Brain Surgery”). Even if LLMs are capable of compute-constrained self-improvement, why would you be open to dismissing all your other points? Would you be so willing to don the robes and beads?

  • I find the capability of LLMs deeply surprising, so I want to track down the source of the surprise before doubling down on anything (except the argument from Slavic pessimism, which the killbots will have to pry from my cold, dead hands)

    • > I find the capability of LLMs deeply surprising

      The more time passes the more I’m convinced that what we are witnessing is that intelligence is not actually that rare or complicated and any kind of system complex enough to create emergent behaviours will end up displaying it.

      Very curious to see where things are going to go from there.