Comment by Yizahi

5 days ago

LLMs can generate anything by design. LLMs can't understand what they are generating so it may be true, it may be wrong, it may be novel or it may be known thing. It doesn't discern between them, just looks for the best statistical fit.

The core of the issue lies in our human language and our human assumptions. We humans have implicitly assigned phrases "truly novel" and "solving unsolved math problem" a certain meaning in our heads. Some of us at least, think that truly novel means something truly novel and important, something significant. Like, I don't know, finding a high temperature superconductor formula or creating a new drug etc. Something which involver real intelligent thinking and not randomizing possible solutions until one lands. But formally there can be a truly novel way to pack the most computer cables in a drawer, or truly novel way to tie shoelaces, or indeed a truly novel way to solve some arbitrary math equation with an enormous numbers. Which a formally novel things, but we really never needed any of that and so relegated these "issues" to a deepest backlog possible. Utilizing LLMs we can scour for the solutions to many such problems, but they are not that impressive in the first place.

> It doesn't discern between them, just looks for the best statistical fit

Of course at the lowest level, LLMs are trained on next-token prediction, and on the surface, that looks like a statistics problem. But this is an incredibly reductionist viewpoint and I don't see how it makes any empirically testable predictions about their limits. LLMs 'learned' a lot of math and science in this way.

> "truly novel" and "solving unsolved math problem"

OK again if novelty lies on a continuum, where do you draw the line? And why is it correct to draw it there and not somewhere else? It seems like you are just naming exceptionally hard research problems.

  • >LLMs 'learned' a lot of math and science in this way.

    Did they? Or is it begging the question?

    • This is why I put 'learned' in quotes. They started from a state of not being able to solve algebra problems or produce basic steps of scientific reasoning to being able to. Operationally, that is what I mean by learning and they unambiguously do it.

If LLMs can come up with formerly truly novel solutions to things, and you have a verification loop to ensure that they are actual proper solutions, I don't understand why you think they could never come up with solutions to impressive problems, especially considering the thread we are literally on right now? That seems like a pure assertion at this point that they will always be limited to coming up with truly novel solutions to uninteresting problems.

  • "Truly novel" is fast becoming a True Scotsman.

    • No True Novelty, No True Understanding, etc.

      The problem with these bromides is not that they're wrong, it's that they're not even wrong. They're predictive nulls.

      What observable differences can we expect between an entity with True Understanding and an entity without True Understanding? It's a theological question, not a scientific one.

      I'm not an AI booster by any means, but I do strongly prefer we address the question of AI agent intelligence scientifically rather than theologically.

      11 replies →

  • It probably can, but won't realize that and it won't be efficient in that. LLM can shuffle tokens for an enormous number of tries and eventually come up with something super impressive, though as you yourself have mentioned, we would need to have a mandatory verification loop, to filter slop from good output and how to do it outside of some limited areas is a big question. But assuming we have these verification loops and are running LLMs for years to look for something novel. It's like running an energy grid of small country to change a few dozen of database entries per hour. Yes, we can do that, but it's kinda weird thing to do. But it is novel, no argue about that. Just inefficient.

    We never had a big demand to define how humans are intelligent or conscious etc, since it is too hard and was relegated to a some frontier researchers. And with LLMs we now do have such demand but the science wasn't ready. So we are all collectively searching in the dark, trying to define if we are different from these programs if not how. I certainly can't do that. I do know that LLMs are useful, but I also suspect that AI (aka AGI nowadays) is not yet reached.

    • How can people look at

      - clear generalizability

      - insane growth rates (go back and look at where we were maybe 2 years ago and then consider the already signed compute infrastructure deals coming online)

      And still say with a straight face that this is some kind of parlor trick or monkeys with typewriters.

      we don’t need to run LLMs for years. The point is look at where we are today and consider performance gets 10x cheaper every year.

      LLMs and agentic systems are clearly not monkeys with typewriters regurgitating training data. And they have and continue to grow in capabilities at extremely fast rates.

      2 replies →

    • > We never had a big demand to define how humans are intelligent or conscious etc, since it is too hard and was relegated to a some frontier researchers. And with LLMs we now do have such demand but the science wasn't ready. So we are all collectively searching in the dark, trying to define if we are different from these programs if not how. I certainly can't do that. I do know that LLMs are useful, but I also suspect that AI (aka AGI nowadays) is not yet reached.

      Alternative perspective: the science may not have been ready, so instead we brute-forced the problem, through training of LLMs. Consider what the overall goal function of LLM training is: it's predicting tokens that continue given input in a way that makes sense to humans - in fully general meaning of this statement.

      It's a single training process that gives LLMs the ability to parse plain language - even if riddled with 1337-5p34k, typos, grammar errors, or mixing languages - and extract information from it, or act on it; it's the same single process that makes it equally good at writing code and poetry, at finding bugs in programs, inconsistencies in data, corruptions in images, possibly all at once. It's what makes LLMs good at lying and spotting lies, even if input is a tree of numbers.

      (It's also why "hallucinations" and "prompt injection" are not bugs, but fundamental facets of what makes LLMs useful. They cannot and will not be "fixed", any more than you can "fix" humans to be immune to confabulation and manipulation. It's just the nature of fully general sytems.)

      All of that, and more, is encoded in this simple goal function: if a human looks at the output, will they say it's okay or nonsense? We just took that and thrown a ton of compute at it.

      1 reply →

> It doesn't discern between them, just looks for the best statistical fit.

Why this is not true for humans?

  • We can't tell yet if that is true, partially true, or false for humans. We do know that LLM can't do anything else besides that (I mean as a fundamental operating principle).

    • Why is it important? “Statistical fit” is what you want…not understanding this is indicative of a limited understanding of what statistics is. What do you think it means to truly understand something? I don’t get it: read probability theory by Jaynes. It doesn’t really matter if the brain does Bayesian updates but that’s what’s optimal…

      1 reply →

> Which a formally novel things, but we really never needed any of that

The history of science and maths is littered with seemingly useless discoveries being pivotal as people realised how they could be applied.

It's impossible to tell what we really "need"

> LLMs can't understand what they are generating

You don't understand what "understanding" means. I'm sure you can't explain it. You are probably just hallucinating the feeling of understanding it.

> Some of us at least, think that truly novel means something truly novel and important, something significant. Like, I don't know...

Yeah.