Comment by _wire_

1 year ago

>We don't fully understand why current LLMs are bad at these tasks.

In complete seriousness, can anyone can explain why LLMs are good at some tasks?

LLMs are good at tasks that don't require actual understanding of the topic.

They can come up with excellent (or excellent-looking-but-wrong) answers to any question that their training corpus covers. In a gross oversimplification, the "reasoning" they do is really just parroting a weighted average (with randomness injected) of the matching training data.

What they're doing doesn't really match any definition of "understanding." An LLM (and any current AI) doesn't "understand" anything; it's effectively no more than a really big, really complicated spreadsheet. And no matter how complicated a spreadsheet gets, it's never going to understand anything.

Not until we find the secret to actual learning. And increasingly it looks like actual learning probably relies on some of the quantum phenomena that are known to be present in the brain.

We may not even have the science yet to understand how the brain learns. But I have become convinced that we're not going to find a way for digital-logic-based computers to bridge that gap.

  • This is also why image generating models struggle to correctly draw highly variable objects like limbs and digits.

    They’ll be able to produce infinite good looking cardboard boxes, because those are simple enough to be represented reasonably well with averages of training data. Limbs and digits on the other hand have nearly limitless different configurations and as such require an actual understanding (along with basic principles such as foreshortening and kinetics) to be able to draw well without human guidance.

  • I would just add that I think I have encountered situations that knowing the weighted average answer from the training data for topics I didn't previously understand created better initial conditions for MY learning of the topic than not knowing the weighted average answer.

    The problem to me is we are holding LLMs to a standard of usefulness from science fiction and not reality.

    A new, giant set of encyclopedias has enormous utility but we wouldn't hold it against the encyclopedias that they aren't doing the thinking for us or 100% omniscient.

  • > What they're doing doesn't really match any definition of "understanding."

    What is the mechanistic definition of "understanding"?

  • What is your definition of understanding?

    Please show me where the training data exists in the model to perform this lookup operation you’re supposing. If it’s that easy I’m sure you could reimplement it with a simple vector database.

    Your last two paragraphs are just dualism in disguise.

    • I'm far from being an expert on AI models, but it seems you lack the basic understanding of how these models work. They transform data EXACTLY like spreadsheets do. You can implement those models in Excel, assuming there's no row or column limit (or that it's high enough) - of course it will be much slower than the real implementations, but OP is right - LLMs are basically spreadsheets.

      Question is, wouldn't a brain qualify as a spreadsheet, do we know it can't be implemented as one? Well, maybe not, I'm not an expert on spreadsheets either, but I think spreadsheets don't allow you circular references, and brain does, you can have feedback loops in the brain. So even if the brain doesn't have something still not understood by us, that OP suggests, it still is more powerful than AI.

      BTW, this is one explanation on why AI fails at some tasks: ask AI if two words rhyme and it will be quite reliable on that. But ask it to give you word pairs that rhyme, and it will fail, because it won't run an internal loop trying some words and checking if they succeed to rhyme or not. If some AI actually succeeds at rhyming, it would do so either because it's trained to contain such word pairs from the get-go or because it's implemented to have multiple passes or something...

      2 replies →

    • People are confusing the limited computational model of a transformer with the "Chinese room argument", which leads to unproductive simultaneous debates of computational theory and philosophy.

      1 reply →

    • Transformer is not a simple vector database doing simple lookup operation. It's doing lookup operation on a pattern, not a word. It learns patterns from the dataset. If your pattern is not there it will hallucinate or give you the wrong answer like GPT4 and Opus gave me hundreds of times already.

  • > the "reasoning" they do is really just parroting a weighted average (with randomness injected) of the matching training data

    Perhaps our brains are doing exactly the same, just with more sophistication?

    • No.

      We know how current deep learning neural networks are trained.

      We know definitively that this is not how brains learn.

      Understanding requires learning. Dynamic learning. In order to experience something, an entity needs to be able to form new memories dynamically.

      This does not happen anywhere in current tech. It's faked in some cases, but no, it doesn't really happen.

      12 replies →

    • Every single discussion of ‘AGI’ has endless comments exactly like this. Whatever criticism is made of an attempt to produce a reasoning machine, there’s always inevitably someone who says ‘but that’s just what our brains do, duhhh… stop trying to feel special’.

      It’s boring, and it’s also completely content-free. This particular instance doesn’t even make sense: how can it be exactly the same, yet more sophisticated?

      Sorry.

      11 replies →

Yes:

An LLM isnt a model of human thinking.

An LLM is an attempt to build a simulation of human communication. An LLM is to language what a forecast is to weather. No amount of weather data is actually going to turn that simulation into snow, no amount of LLM data is going to create AGI.

That having been said, better models (smaller, more flexible ones) are going to result in a LOT of practical uses that have the potential to make our day to day lives easier (think digital personal assistant that has current knowledge).

  • Great comment. Just one thought: Language, unlike weather, is meta-circular. All we know about specific words or sentences is again encoded in words and sentences. So the embedding encodes a subset of human knowledge.

    Hence, a LLM is predicting not only language but language with some sort of meaning.

    • That re-embeding is also encoded in weather. It is why perfect forecasting is impossible, why we talk about the butterfly effect.

      The "hallucination problem" is simply the tyranny of Lorenz... one is not sure if a starting state will have a good outcome or swing wildly. Some good weather models are based on re-runing with tweaks to starting params, and then things that end up out of bounds can get tossed. Its harder to know when a result is out of bounds for an LLM, and we dont have the ability to run every request 100 times through various models to get an "average" output yet... However some of the reuse of layers does emulate this to an extent....

  • Ugh. Really? Those "simulated water isn't wet"(when applied to cognition) "arguments" were punched so many times it even hurts to look at them.

    • No simulated water isnt wet.

      But an LLM isn't even trying to simulate cognition. It's a model that is predicting language. It has all the problems of a predictive model... the "hallucination" problem is just the tyranny of Lorenz.

      4 replies →

LLM’s are a compressed and lossy form of our combined writing output, which it turns out is similarly structured enough to make new combinations of text seem reasonable, even enough to display simple reasoning. I find it useful to think “what can I expect from speaking with the dataset of combined writing of people”, rather than treating a basic LLM as a mind.

That doesn’t mean we won’t end up approximating one eventually, but it’s going to take a lot of real human thinking first. For example, ChatGPT writes code to solve some questions rather than reasoning about it from text. The LLM is not doing the heavy lifting in that case.

Give it (some) 3D questions or anything where there isn’t massive textual datasets and you often need to break out to specialised code.

Another thought I find useful is that it considers its job done when it’s produced enough reasonable tokens, not when it’s actually solved a problem. You and I would continue to ponder the edge cases. It’s just happy if there are 1000 tokens that look approximately like its dataset. Agents make that a bit smarter but they’re still limited by the goal of being happy when each has produced the required token quota, missing eg implications that we’d see instantly. Obviously we’re smart enough to keep filling those gaps.

  • "I find it useful to think “what can I expect from speaking with the dataset of combined writing of people”, rather than treating a basic LLM as a mind."

    I've been doing this as well, mentally I think of LLMs as the librarians of the internet.

    • They're bad librarians. They're not bad, they do a bad job of being librarians, which is a good thing! They can't quite tell you the exact quote, but they do recall the gist, they're not sure it was Gandhi who said that thing but they think he did, it might be in this post or perhaps one of these. They'll point you to the right section of the library to find what you're after, but make sure you verify it!

      1 reply →

I'd guess because the Transformer architecture is (I assume) fairly close to the way that our brain learns and produces language - similar hierarchical approach and perhaps similar type of inter-embedding attention-based copying?

Similar to how CNNs are so successful at image recognition, because they also roughly follow the way we do it too.

Other seq-2-seq language approaches work too, but not as good as Transformers, which I'd guess is due to transformers better matching our own inductive biases, maybe due to the specific form of attention.

> why LLMs are good at some tasks?

Like how we explain human doing tasks -- they are evolved to do that.

I believe this is a non-answer, but if we are satisfied with that non answer for human, why not LLMs?

  • I would argue that we are not satisfied with that answer for humans either.

If you look at transfer learning, I think that is a useful point at which to understand task-specific application and hence why LLMs excel at some tasks and not others.

Tasks are specialised for using the training corpus, the attention mechanisms, the loss functions, and such.

I'll leave it to others to expand on actual answers, but IMO focusing on transfer learning helps to understand how an LLM does inferences.