← Back to context

Comment by nicklecompte

1 year ago

There is a reason Borges's Library of Babel contained all combinatorially possible texts, with almost all of them being pure gibberish. Borges was wise enough to understand that the following is meaningless, even for a story about a magic library:

"Imagine a collection that does not only contain all the texts produced by humans, but, well beyond what has already been physically written, also encompasses all the texts that a human could read and at least superficially comprehend."

To be clear this is a horrifically dishonest metaphor about LLMs. IMO the most glaring flaw in the technology is that they can't handle new ideas which don't appear in the training set. It is true that ChatGPT doesn't deal with this use case very often because it mostly handles trivialities. But it does mean that this entire argument is navel-gazing speculation.

The bigger problem is that the entire idea of "all texts a human could superficially comprehend" is meaningless, and the paper proceeds to reason based off this utter fallacy. The beauty of Borges's Library of Babel was that he realized that humans are capable of "superficially comprehending" any text, even if it was created by a uniform random ASCII generator. This is the basis of numerology, and why Borges's story included superstitious cult behavior of people destroying and/or sanctifying "meaningful" gibberish. If we have a good enough reason to find meaning in text, we'll find it. Humans don't actually rely on symbolic reasoning, we just use that for communication and organization: give us the symbols and we will reason about them, using cognition which is far too squishy to fit in a book. It's especially dangerous when the symbols obey human grammar and imitate social tones of authoritativeness, mysticism, etc.

And then there's...this:

"The invention of a machine that can not only write stories but also all their variations is thus a significant milestone in human history."

I am not a writer. But speaking as a homo sapien, it is genuinely insulting to call ChatGPT a machine that can write "all variations" of a story. This paper needed to be reviewed by a serious writer or philosopher before being put on the arXiv.

What are some examples of "new ideas"? I'm having a hard time imagining an idea that can't be expressed as a combination of existing concepts.

Better concepts can arise when we make discoveries about reality (which takes experimentation), but there's a lot more juice to squeeze from the concepts we currently have.

  • "an idea that can't be expressed as a combination of existing concepts."

    The problem is that if an LLM hasn't been pretrained on the specific idea, it won't have a grasp of the what the correct concepts are to make the combination. It will be liable to substitute more "statistically likely" concepts, but since that statistic is based on a training set where the concept didn't exist, its estimation of "likely" is flawed.

    One good example is patents: https://nitter.net/mihirmahajan/status/1731844283207229796 LLMs can imitate appropriate prose, but really struggle to maintain semantic consistency when handling new patents for inventions that, by definition, wouldn't have appeared in the training set. But this extends to almost any writing: if you are making especially sophisticated or nuanced arguments, LLMs will struggle to rephrase them accurately.

    (Note that GPT-4 is still extremely bad at document summarization, even for uninteresting documents: your Q3 PnL number is not something that appeared in the training set, and GPT-4 is liable to screw it up by substituting a "statistically likely" number.)

    In my experience GPT-3.5 is extremely bad at F#: although it can do simple tasks like "define a datatype that works for such-and-such," it is much less proficient at basic functional programming in F# than it is Haskell - far more likely to make mistakes, or even identifiably plagiarize from specific GitHub repos (even my own). That's because there's a ton of Functional Programming 101 tutorials in Haskell, but very few in F#. I am not sure about GPT-4. It does seem better but I haven't tested it as extensively.

    • The problem is that if an LLM hasn't been pretrained on the specific idea, it won't have a grasp of the what the correct concepts are to make the combination

      And this isn't true of humans?

  • They cannot be expected to produce useful new ideas because the ideas exist in lacunae in their probabilities: despite the extant possible novel combination of ideas (which isn’t the only option for new ideas: neologisms exist), the LLM has never seen it and so will (probabilistically) never produce it because it is equivalent to nonsense.

    The exception to this is if the new ideas are somehow present in the structure of language and are internalized and/or presented in an emergent form.