← Back to context

Comment by awongh

2 years ago

I’m not an expert but I suspect that this aspect of lack of correctness in these models might be fundamental to how they work.

I suppose there’s two possible solutions: one is a new training or inference architecture that somehow understand “facts”. I’m not an expert so I’m not sure how that would work, but from what I understand about how a model generates text, “truth” can’t really be a element in the training or inference that affects the output.

the second would be a technology built on top of the inference to check correctness, some sort of complex RAG. Again not sure how that would work in a real world way.

I say it might be fundamental to how the model works because as someone pointed out below, the meaning of the word “material” could be interpreted as the air inside the duck. The model’s answer was correct in a human sort of way, or to be more specific in a way that is consistent with how a model actually produces an answer- it outputs in the context of the input. If you asked it if PVC is heavier than water it would answer correctly.

Because language itself is inherently ambiguous and the model doesn’t actually understand anything about the world, it might turn out that there’s no universal way for a model to know what’s true or not.

I could also see a version of a model that is “locked down” but can verify the correctness of its statements, but in a way that limits its capabilities.

> this aspect of lack of correctness in these models might be fundamental to how they work.

Is there some sense in which this isn't obvious to the point of triviality? I keep getting confused because other people seem to keep being surprised that LLMs don't have correctness as a property. Even the most cursory understanding of what they're doing understands that it is, fundamentally, predicting words from other words. I am also capable of predicting words from other words, so I can guess how well that works. It doesn't seem to include correctness even as a concept.

Right? I am actually genuinely confused by this. How is that people think it could be correct in a systematic way?

  • I think very few people on this forum believe LLMs are correct in a systematic way, but a lot of people seem to think there's something more than predicting words from other words.

    Modern machine learning models contain a lot of inscrutable inner layers, with far too many billions of parameters for any human to comprehend, so we can only speculate about what's going on. A lot of people think that, in order to be so good at generating text, there must be a bunch of understanding of the world in those inner layers.

    If a model can write convincingly about a soccer game, producing output that's consistent with the rules, the normal flow of the game and the passage of time - to a lot of people, that implies the inner layers 'understand' soccer.

    And anyone who noodled around with the text prediction models of a few decades ago, like Markov chains, Bayesian text processing, sentiment detection and things like that can see that LLMs are massively, massively better than the output from the traditional ways of predicting the next word.

  • > Is there some sense in which this isn't obvious to the point of triviality?

    This is maybe a pedantic "yes", but is also extremely relevant to the outstanding performance we see in tasks like programming. The issue is primarily the size of the correct output space (that is, the output space we are trying to model) and how that relates to the number of parameters. Basically, there is a fixed upper bound on the amount of complexity that can be encoded by a given number of parameters (obvious in principle, but we're starting to get some theory about how this works). Simple systems or rather systems with simple rules may be below that upper bound, and correctness is achievable. For more complex systems (relative to parameters) it will still learn an approximation, but error is guaranteed.

    I am speculating now, but I seriously suspect the size of the space of not only one or more human language but also every fact that we would want to encode into one of these models is far too big a space for correctness to ever be possible without RAG. At least without some massive pooling of compute, which long term may not be out of the question but likely never intended for individual use.

    If you're interested, I highly recommend checking out some of the recent work around monosemanticity for what fleshing out the relationship between model-size and complexity looks like in the near term.

  • Just to play devil’s advocate: we can train neural networks to model some functions exactly, given sufficient parameters. For example simple functions like ax^2 + bx + c.

    The issue is that “correctness” isn’t a differentiable concept. So there’s no gradient to descend. In general, there’s no way to say that a sentence is more or less correct. Some things are just wrong. If I say that human blood is orange that’s not more incorrect than saying it’s purple.

  • Because it is assumed that it can think or/and reason. In this case, knowing the concepts of density, the density of a material, detecting the material from an image, detecting what object this image is. And, most importantly, knowing that this object is not solid. Because then it could not float.

  • Maybe you simplify a bit what "guessing words from other words" means. HOW do you guess this, is what's mysterious to many: you can guess words from other words due to habit of language, a model of mind of how other people expect you to predict, a feedback loop helping you do it better over time if you see people are "meh" at your bad predictions, etc.

    So if the chatbot is used to talking, knows what you'd expect, and listens to your feedback, why wouldn't it also want to tell the truth like you would instinctively, even best effort only ?

    Sadly, the chatbots doesn't yet really care about the game it's playing, it doesn't want to make it interesting, it's just like a slave producing minimal low-effort outputs. I've talked to people exploited for money in dark places, and when they "seduce" you, they talk like a chatbot: most of it is lie, it just has to convince you a little bit to go their way, they pretend to understand or care about what you say, but end of the day, the goal is for you to pay. Like the chatbot.

  • Yeah. I think there's some ambiguity around the meaning of reasoning- because it is a kind of reasoning to say a Duck's material is less dense than water. In a way it's reasoned that out, and it might actually say something about the way a lot of human reasoning works.... (especially if you've ever listened to certain people talk out loud and say to yourself... huh?)