Comment by Nevermark
1 day ago
> My intention is to highlight the fact that LLM conversations are cleverly disguised examples of sentence continuation
Regardless of bigger issues, this kind of statement reveals a deep misunderstanding.
Problem type does not limit problem complexity. Nor does problem type limit solution complexity or power.
If a machine has to learn to understand humans to complete text, then that is what it has to do. And there is no theoretical or practical basis for suggesting that this is somehow "faking" understanding, just because of the form of original data streaming in and out.
Neither problem type, nor input/output structure, limit internal representations.
Understanding is learned from patterns in the data, not the gross form of the data. Does the data require an understanding of something to complete the task? Then that understanding will be what is optimized.
To the degree they are limited, it is for other reasons. Resources such as computing, parameter number, lack of representative data, ... Which in the cases of SOTA models, we know are not limits. A conclusion verified by the models' actual abilities.
Raphaël Millière has a very useful term for this kind of vacuous dismissal, the redescription fallacy (https://arxiv.org/pdf/2401.03910, page 9):
> Recent debates have been clouded by a misleading inference pattern, which we term the “Redescription Fallacy.” This fallacy arises when critics argue that a system cannot model a particular cognitive capacity, simply because its operations can be explained in less abstract and more deflationary terms. In the present context, the fallacy manifests in claims that LLMs could not possibly be good models of some cognitive capacity because their operations merely consist in a collection of statistical calculations, or linear algebra operations, or next-token predictions. Such arguments are only valid if accompanied by evidence demonstrating that a system, defined in these terms, is inherently incapable of implementing . To illustrate, consider the flawed logic in asserting that a piano could not possibly produce harmony because it can be described as a collection of hammers striking strings, or (more pointedly) that brain activity could not possibly implement cognition because it can be described as a collection of neural firings. The critical question is not whether the operations of an LLM can be simplistically described in non-mental terms, but whether these operations, when appropriately organized, can implement the same processes or algorithms as the mind, when described at an appropriate level of computational abstraction.
> or (more pointedly) that brain activity could not possibly implement cognition because it can be described as a collection of neural firings.
This sounds like a dismissal of the argument through a characterized straw man.
That is, it seems that reducing the complexity of the brain to "collection of neural firings" is not being honest about everything involved to a much greater degree than saying neural networks are a "collection of statistical calculations".
I too believe LLM's will grow in complexity, but presently I can not even fathom how they can be compared to the complexity of a system such as the human brain.
Complex processes don't necessarily require complex substrates, if that's what you mean.
10 replies →
> presently I can not even fathom how they can be compared to the complexity of a system such as the human brain
Totally understandable; I don't think we can fully understand the human brain, using the human brain. We can understand its principles (firings and chemistry, structure and specialized areas, etc) but otherwise it's a capacity problem.
And while I can't fully understand myself, let alone another person, I definitely enjoy talking with people and sharing thoughts that I realize I wouldn't have had on my own.
> In the present context, the fallacy manifests in claims that LLMs could not possibly be good models of some cognitive capacity because their operations merely consist in a collection of statistical calculations, or linear algebra operations, or next-token predictions
Nobody actually makes this argument though.
If you want examples of this, see the recent book "The AI Con"
https://www.goodreads.com/en/book/show/217432753-the-ai-con
which describes LLMs as "souped-up autocomplete", complex statistics that cannot truly understand anything. A more recent example is this paper:
https://zenodo.org/records/20071869
which says,
> [LLMs], as turbo-charged statistical models (recall their formal relation to logistic regression) can only but provide correlations.
And, of course, the Stochastic Parrot paper is the classic example in this area. It is from 5 years ago, but "LLMs only do statistics / can't understand" is very much alive and active among academics, even if it is a minority position.
5 replies →
Are you serious? I hear it every single day, especially from computer scientists. There are top ranked posts here on HN _today_ with this argument.
11 replies →
I think, for me, the thing is that when you do basic ML, you discover that ML will very often find data pattern that fit the goal but does not correspond to a real mechanism.
So, I think there is a flaw in the logic of saying that human text have a pattern of "consciousness mechanism" and therefore LLM will learn "consciousness mechanism" in order to return sentence continuation that is convincing. There is probably tons of data pattern that LLM can learn from to be able to reproduce a sentence continuation that is convincing without having to learn the specific mechanism that is "conscious".
For me, one element that shows it is the case is the absence of world model (or "human-like" world model) despite the fact that the sentence continuation is convincing. If indeed the only way to produce sentence continuation convincingly would be by "simulating a brain", then it would not explain the first LLM from several years ago (before the extra layers of RLHF, ...). They were able to have quite convincing conversation on a lot of non-trivial aspect, and yet failed on some aspects that should have been basic for a system that would have been trained to work like a human brain. It shows that it is possible to "cleverly disguise examples of sentence continuation" without having to build elements that one expect on a conscious being.
I didn't make the claim that a model can learn consciousness.
Understanding is not consciousness.
Their training is all about understanding. There is nothing in their architecture or training that credibly optimizes for rich self-awareness.
Given non-persistent experience, non-continuous operation, no ability to build up generalizations and aggregate experience of their own self-awareness over time, they seem to be structurally designed to not have consciousness.
This is a case where acting is very credible. Understanding of other's consciousness, in a functional and third party sense, isn't a substrate for personal experience.
In stark contrast, humans develop consciousness gradually over continuous time with persistent aggregation of experience. By the time we can recognize our own consciousness in the abstract, and reason about it, we have had it for some time.
I use "consciousness" because it's the point of the original argument, but in fact, I think my whole comment still work well if you replace "consciousness" with "understanding".
My point is that the fact that AI can reproduce convincingly human sentence continuation does not imply that the AI has no choice but ending up using a mechanism that "understand" rather than just have learned data patterns that are very effective to fake human sentence continuation but are meaningless in term of understanding the concepts.
And I think that if indeed the only way for AI to reproduce convincingly human sentence continuation would be to end up in a configuration that uses the "understand" mechanism to do so, the behaviour of the first LLM would not show that they are so good at sounding human and yet so bad at failing basic "understanding" tests.
7 replies →
I think Searle's Chinese Room argument refutes this. LLMs are simply manipulating symbols, they do not have semantic understanding. This is why hallucinations exist. And Searle's argument extends even further than LLMs.
You are basically arguing for a functional account of consciousness, but things like this have been debated for literally decades/centuries in philosophy.
12 replies →
I’m also fixated on the term “experience” in the context of this debate. To me, consciousness is something that one “experiences”, and the two concepts are intertwined.
I am far from convinced that the training and inference regimes of LLMs would qualify as “experience” by any sense of the word.
Now, if we hooked up a plethora of audiovisual and tactile sensors with live feedback directly to a neural network rich with transformers, that was always powered on and fully autonomous, we may be getting there. But we’d probably also be on the verge of manmade horrors beyond our comprehension.
Biological rodent neural networks in a Petri dish stimulated by electrical impulses - more or less conscious than LLMs?
Human on life support, unable to respond to any external stimuli, “braindead” - more or less conscious than LLMs?
1 reply →
> a flaw in the logic [...] mechanism
Similar to: "Birds fly, my spinning helical device flies, therefore we've started to replicate how birds fly."
> without having to build elements that one expect on a conscious being
One of the elements I expect in a conscious being is that you can't rewrite it by changing the introductory paragraph.
When it comes to LLMs, almost every "mind" we humans perceive is a fictional character in an LLM-generated story-document, one we are either reading or which is being "acted" at us by regular code. Our own instinct for pareidolia and simulating/inferring other minds is very strong, which means we should require really good evidence/logic to counter our instincts.
Even if one believes the LLM has a single "real mind" as an author of every document... what evidence do we have that it is conscious or "self-inserting" itself as one of the characters in the document?
>One of the elements I expect in a conscious being is that you can't rewrite it by changing the introductory paragraph.
If we had enough knowledge of the workings of the human brain, you could alter the perception of every single memory you've ever had. And limited versions of this already happen all the time. Human memory is notoriously unreliable for a reason.
Are you aware of the Recovered Memory Therapy Scandals of the 80s/90s ? Boy did that ruin a lot of lives. You can rewrite a human by changing their 'introductory paragraph'. It's just not as accessible.
4 replies →
> I think there is a flaw in the logic of saying that human text have a pattern of "consciousness mechanism" and therefore LLM will learn "consciousness mechanism" in order to return sentence continuation that is convincing.
There is no independent "consciousness mechanism" that one might imagine humans have learned or evolved for its own sake. Evolution learns various solutions to optimization problems, and so if consciousness evolved then it was either useful instrumentally, or it is a byproduct of some organization that is useful instrumentally. The point is that as a solution to certain kinds of optimization problems, consciousness can conceivably be the solution to the optimization problem of predicting the next token of text written by humans who themselves have complex phenomenology. There is nothing that a priori constrains token prediction from the domain of consciousness.
>For me, one element that shows it is the case is the absence of world model (or "human-like" world model) despite the fact that the sentence continuation is convincing
World models don't have to be rich and detailed to count as a world model. Lower life forms might be conscious but they only model the part of the world useful for their existence in their ecological niche.
> The point is that as a solution to certain kinds of optimization problems, consciousness can conceivably be the solution to the optimization problem of predicting the next token of text written by humans who themselves have complex phenomenology.
Yes, I agree with that. Consciousness is a good way of generating convincing human text.
What I don't agree with is that consciousness is the only way to generate convincing human text and that because we have convincing human text, it can only imply we have consciousness.
There is a huge probability that generating convincing human text can be done without consciousness. Either because there are efficient mechanisms as efficient as the way the human brain deal with this problem and that the LLM found one of them (and these mechanism may be quite difficult to imagine for a human). Or even because the LLM found a local minimum and is stuck there.
To re-use the evolution approach: evolution solved the "flying problem" with bird feathers, but also with insect wings or bat wings. The fact that evolution ended up using feather does not imply that everything that flies can only fly with feathers.
> World models don't have to be rich and detailed to count as a world model
I agree in general, but here, we are talking about machine that reproduce all human language. The argument I'm answering to is pretending that "all of human knowledge" is understood, which include every single human concept. This has to be everything, because LLM is able to provide convincing text about every subject. If on some subject, the LLM is able to provide convincing text without "understanding" it, then the argument that it is impossible to provide convincing text without understanding it collapse.
> There is no independent "consciousness mechanism" that one might imagine humans have learned or evolved for its own sake.
> There is nothing that a priori constrains token prediction from the domain of consciousness.
We don’t know either of these are true or false though. We simply don’t know. There is no agreed upon definition of consciousness, aside from maybe _the having of qualia_, so arguing that some can or cannot be conscious a priori can’t be done.
2 replies →
I think, for me, the thing is that when you tutor undergrads in abstract math, you discover that students will very often find data pattern that fit the goal but does not correspond to a real mathematical principle.
sometimes humans making claims about AI intelligence or consciousness also identify spurious patterns that do not correspond to the problems of intelligence or hard consciousness.
Of course. But in my explanation "consciousness" or "understanding" is not "finding pattern", it is the pattern itself.
CNN are finding patterns, sometimes relevant, sometimes spurious, but I don't think people argue that CNN have evolved consciousness or understanding of what a cat or a dog is.
Here, the argument is "LLM are able to understand, because 'understanding' is the only pattern to reach the goal". I'm saying that it is unlikely to be the only pattern, and that it is likely that they find a local minimum on a system that reaches the goal that does not use 'understanding'.
The reason I'm saying it is likely is because "basic" LLM shows behaviours where they are producing convincing human text and yet doing things that are really difficult to reconciliate with the fact that they have understanding.
(And before that old argument is used, yes, I know sometimes some humans fail to understand. The problem is that the majority of humans don't fail to understand basic stuff in the majority of the time, while the "basic" LLMs do. The fact that you roll 10 dices 100 times and 1 of them never land on 1 does not convince me that that set of dice is loaded. The fact that you roll 10 dices 100 times and 9 of them never land on 1 does convince me that that set of dice is loaded.)
> students will very often find data pattern that fit the goal but does not correspond to a real mathematical principle.
That reminds me of a niche paper [0] critiquing a certain way of teaching remedial math that was over-focused on tests. A kid named Benny (12) was building up (wrong) "rules" for math which still somehow gave enough of an illusion of progress in terms of test scores that his misunderstandings hadn't been caught earlier.
> Benny was able to explain his procedure; e.g. for 5/10=1.5, he said: "The one stands for 10; the decimal; then there’s 5... shows how many ones." In another example, 400/400 = 8.00 because "The numbers are the same [number of digits]... say like 4000 over 5000. All you do is add them up; put the answer down; then put your decimal in the right place... in front of the [last] three numbers."
[0] https://people.wou.edu/~girodm/library/benny.pdf
Not just undergrads. Even folks who believe in astrology or numerology depend on finding patterns in unrelated events to explain human behaviours.
But the machines don't understand. They predict. And what they predict is the next token. I'm not trying to beat this horse to death, but you have to realize using the word "understand" is anthropomorphising it. It's essentially the chinese room experiment -- if the rules are followed, no understanding is neccessary.
If the tokens didn't correlate to words imbued with meaning outside the system, if the LLMs were trained on patterned data that had no meaning to humans or something there wouldn't be any conversation about these things being conscious at all.
What does “understand” mean?
Turing complete systems can be built out of matrix multiplications, out of attention, out of key/value lookups. The Chinese room is Turing complete. By claiming it cannot understand things because it is built out of components computing devices can be built out of, we are claiming no computer can because no computer can. This is a very bold claim indeed, and also we’re assuming the conclusion! The claim is no more convincing than “brains cannot understand things because they are made out of neurons”. The system may or may not have some particular properties, but we have to do more work than just gesturing at the components the system is made of when making claims about it; the alternative is, at best, a world where we prove too much and conclude that humans, too, are not conscious.
For starters, we need to pin down the terms under discussion enough that they don’t just mean whatever we need them to in the moment.
Maybe when the system can generate its own state, rather than that state existing as a deterministic output of its inputs, maybe then we can consider it to have the interiority and reflexiveness required to call something 'thought'.
A very convincing simulation of thought is not thought, especially without memory or any evidence of an independent will. LLMs have the same kind of consciousness as a nematode, albeit with less-complex behaviour.
>What does “understand” mean?
Exactly.
There's a growing body of evidence that most of what the brain does is constantly predicting the world around us - look into the predictive brain hypothesis if you're interested.
As Ilya Sutskever has pointed out, if you read a mystery novel up until the reveal of the culprit, and then fill in "The killer was _____", don't you need to understand the novel to accurately predict the next word?
The understanding is inside of the system, in LLMs and in the Chinese Room. I agree with Daniel Dennett that it's preposterous to say that Chinese is not understood in any meaningful sense in the Chinese Room scenario -- it's just that the understanding has been hidden away in the background of the scenario.
Language is tremendously complicated. "Time flies like an arrow, but fruit flies like a banana." "Hard hats must be worn on site; dogs must be carried on escalators", etc. Predicting the next token requires understanding, full stop.
> if the rules are followed, no understanding is neccessary.
The rules are the understanding.
(Note that understanding != consciousness)
However it’s disingenuous to say the inference is on the next token because it’s actually not, it’s in the models parameter space across a set of nonlinear activation functions then effectively projected into the token. The idea its predictive of the token isn’t actually the case, it really is a much more complex and more semantic relationship that ends in the series of tokens through the attention mechanism.
The article also makes this assertion that it replays everything over and over again to create each character one at a time as some way to demonstrate the autoregressive self attention mechanism but it’s really not accurate at all, and it trivializes what is going on.
I’m am not asserting LLMs are aware or conscious that’s on the surface profoundly absurd. And I do understand your point that the fact it emits in words something that seems to speak to us gives to the air of humanity that’s isnt real. However there is a very real emergent reality that our language alone appears to lead to embedding a form of thought and understanding that is latent in our use of language in communicating that is in fact coming through the model. It is not regurgitating its corpus and pattern matching because the patterns you input and it emits are not where the inference is operating, its within this enormous vector space through these complex non linear activation functions with learned residuals not in the language corpus.
It is not conscious or aware. It is something else, not human. But if you can not see it as amazing you have lost the capacity to dream.
> But if you can not see it as amazing you have lost the capacity to dream.
I completely disagree. I think if you think these things are amazing, your dreams are incredibly limited and boring.
I remember the first time I talked to a chatbot. Not an LLM, just a regular chatbot, like ELIZA or any other dumb bot.
For a few seconds, it felt magical, like I was talking to a computer that understood me, as it made replies that were sensible to what I was saying. Then it said something incredibly stupid and jarring that made no sense, and that took the magic away. Oh, this is just a dumb computer program.
I remember the first time I talked to an LLM-powered chatbot. It was the exact same thing, except the magic feeling lasted a tiny little bit longer and was a tiny little more convincing. But it went away in the exact same way, for the exact same reason. Once you've seen the emperor without clothes, nothing brings back the magic.
>it’s disingenuous to say the inference is on the next token because it’s actually not, it’s in the models parameter space across a set of nonlinear activation functions then effectively projected into the token. The idea its predictive of the token isn’t actually the case, it really is a much more complex and more semantic relationship
Do you, or anyone reading, have any worthwhile links that make a strong case for this (that there is a stronger semantic relationship than simply next token prediction)? I would like to read more about this.
Right, it's an illusion of understanding. There is some sort of symbolic understanding, but that is completely due to the fact that the training data was made by humans who actually do understand, can interact with the world, and can write their thoughts down so that the LLM can insert some sort of reference to "basketball" and "Michael Jordan" in their embeddings or whatever.
What if to become really good at predicting you must have some of what we call understanding?
You are oversimplifying. They do produce one word per cycle. But they can also have context buffers carrying up to two million tokens, which is most definitely larger than your measly human short-term memory context buffers.
You, of course, wouldn't notice if your only experience of LLMs was chatting with the cheapest, smallest, least capable LLMs that you get through ChatGPT, or Google search.
It becomes pretty obvious when you use a coding AI on a daily basis. It is the context buffer in which the magic occurs, not the tokens that get spit out one at a time.
Every day, I watch my coding AI develop plans, search the web a half dozen times for documentation, grep through my entire codebase looking for pieces of related code and context, analyze relevant source code across multiple files, spit out an initial plan for implementing the fix before starting to execute it, run requests through some sort of advanced mathematics tool (they are EXTREMELY good at graduate-level calculus and linear algebra), implement fixes that extend across half a dozen files in 2 different computer languages (typescript and C++), run trial compiles and fix coding errors in its output, sometimes developing sub-plans to deal with compile errors. I've seen it get halfway through a fix and revise its initial plan mid-flight as it encounters something in existing source code.
Not vibe coding, to be clear. Targeted use of a coding tool by a by a professional senior software developer with decades of experience, and fair bit of expertise with the limits of what sort of problems my coding AI can and cannot do. Every line code reviewed. Sometimes it needs additional prompts, telling it how it mis-implemented something, or specifying more carefully what I actually want but didn't properly express in the initial request
All the time maintaining that context across multiple request, so that I don't have to restate requests from scratch.
A particularly interesting revision: "You have misread the equation (13) on page 112 of 'Spice, the Manual 2nd ed.'. I should be ....". (It had previously identified the textbook as a source I was using, from comments in source, in a preceding request, and actually already read cited pages in the PDF file, which it had found online). And I had actually asked it to implement equation (13), which was, in fact, badly typeset. The error it had made was defensible, if not the best reading of the equation.
"You are correct. Let me fix that." (producing updates to the implementation of the equation in code, AND code that implements the symbolically-differentiated version of that equation 60 lines later, which is not explicitly given in the text). The text says "take the lagrangian of equations (11), (12) and (13)" or something like that.
ALL information that gets carried in context buffers, even though it's generating code one word at a time. The bulk of the magic occurs in context buffers, not spitting out words one at a time, which, for my coding AI is, I think about 250,000 tokens.
I think it's pretty safe to think that my coding AI is working out of context buffers that may carry plans and research results consisting of tens or hundreds of thousands of arranged tokens carried in context buffers through the multiple steps of the implementation, and later revision. None of that would be possible if were simply working one token ahead.
I kind of suspect that a lot of activity occurs in the first few words of its response. "Let me examine your current source code and develop a plan. Ok. I can see on line 131 where you want me to implement the equation.". (An opportunity to perform about 27 updates of the context buffer). And in the sometimes hundreds of lines of output it generates as it talks itself through what it needs to do.
I use coding agents every day. They're useful. It hasn't changed my mind on what they are.
The funny example from a few months ago asked chatgpt 5.2 if one should walk to the car wash because it’s close by, and it answered yes. This shows that it is in fact sentence continuation and not real intelligence or consciousness (whatever that may be). Even the reasoning model answered the same.
I would maybe agree with you if the entire realm of human existence was limited to words. There are many human experiences that transcend text, and indeed can hardly be adequately described using text.
Sure, it's the best we have online, but that does not make "the internet" the sum of all human experience. To reduce all of humanity down to the text on the internet is reducing us to the level of machines to fit the requirement of what a machine can process / simulate.
In the life of humanity, text has only existed a relatively short time.
I don't think they're asserting that all of human existence can be subsumed as text, though? Just that "consciousness", or "understanding", in some meaningful sense could be exhibited by a system that can only interact with the world through text?
Are you saying that a sufficiently advanced version of sentence continuation is indiscernible from actual understanding?
He isn't saying sentence completion is what keeps LLMs from understanding, he saying that's all they do (regardless of how advanced it is), and that isn't enough. You also need a body with senses and organs that produce a physiological response to emotions, and emotions are necessary for consciousness.
> If a machine has to learn to understand humans to complete text, then that is what it has to do. And there is no theoretical or practical basis for suggesting that this is somehow "faking" understanding, just because of the form of original data streaming in and out.
I think the main complaint is LLMs don’t arrive at the answer the way we do. It’s capable of emulating some of our behavior but not all as the mechanism by which it works is very different.
Maybe I’m wrong about this but one thing humans do that LLMs don’t is deductive reasoning. LLMs seem to operate entirely of inductive reasoning.
> I think the main complaint is LLMs don’t arrive at the answer the way we do.
This isn't an argument against their understanding things.
But I expect you are right, that their understanding may have major different qualities from ours.
Along with significant commonalities. (They don't reason via stream of consciousness in a way alien to us.)
> of the form of original data streaming in and out.
Except this is not consciousness.
I will say, I find it fascinating that there are some philosophers and consciousness researchers who seem to be less certain. I just listened to Chris Hayes interview David Chalmers this week, whose position seemed to be that it's probably not conscious, but that we can't be certain. And more than that: he seemed open to the idea that they may become conscious under further scaling/training/advancements.
It's a great interview, if you're interested: https://www.youtube.com/watch?v=NgDIG8u1-CA
Imagine yourself in an isolation chamber. What are you thinking? Are you no longer conscious?
4 replies →
"If a machine has to learn to understand humans to complete text, then that is what it has to do."
But the machine doesn't have to understand humans to do that. It gets trained on a whole bunch of sentences and then it is able to complete text. You could maybe claim that it "understands" the text but even that's a stretch.
Your “and then” is doing a lot of work there. The steps between may or may not include some form of “learn to understand humans”, but you can’t just hide them behind “and then” if what we are doing is claiming some particular thing is not in the list.
Through training on human text, we are building implicitly in the weights a statistical model of what humans might write in response when presented with arbitrary pieces of text. It turns out that we can make these incredibly accurate.
If building an accurate internal model of something then using it to predict that thing’s behaviour is different to gaining understanding of that thing, we will need to pin down exactly what “understanding” means, or we are forever doomed to talk at cross purposes.
My "and then" simply implies order of operations. When it's fully "trained" then (and only then) can it generate text.
And I will reassert that even if it "understands" the text it was trained on, that is not the same as understanding humans. I mean really, we ARE humans and we barely understand humans.
The thing LLMs model and "predict" is simply, what words in what order are statistically common given these input words in this order.
You can write (non-ai) software to model and predict things using the laws of physics. I'd wager it would do a better job than any LLM at predicting where a rocket will go through space. Does that mean the program is conscious and "understands" physics? No
It can't even natively understand how many letters there are in words - how will it understand the meaning?
I wish people would do even the most basic amount of research into LLMs before opining about what they can or cannot do. There are very principled reasons why LLMs do not know how many letters are in words, and it says nothing about their facility for understanding meaning.
Tokens are the most basic input unit of an LLM. But tokens don't generally correspond to words or letters, rather sub-word sequences. So Strawberry might be broken up into two tokens 'straw' and 'berry'. It has trouble distinguishing features that are "sub-token" like specific letter sequences because it doesn't see letter sequences but just the token as a single atomic unit. 'Straw' and 'r' are two tokens but an LLM is entirely blind to the fact that 'straw' has one 'r' in it.
As an analogy, I might ask you to identify the relative activations of each of the three cone types on your retina as I present some solid color image to your eyes. But of course you can't do this, you simply do not have cognitive access to that information. Individual color experiences are your basic vision tokens.
The widespread mistake people keep making is assuming the development of intelligence in LLMs should follow the same trajectory that human intelligence takes as it develops into adult levels of intelligence. Thus deficiency in some capacity that we take for granted in humans is an indictment on LLM intelligence. But this is specious. LLMs are entirely alien; their developmental paths do not and should not look anything like ours. Your intuition from human intelligence just works against understanding the potential for intelligence in LLMs.
8 replies →
This is kind of a like assuming someone with bad spelling is stupid.
Counting letters in a word seems to have little to do with understanding the word. Young kids can’t spell or count well at all but no one says that means they can’t understand.
This is like saying because humans can't multiply 23472 by 1836736 in less than 5 nanoseconds that they can't possibly understand anything about maths.
You can't natively understand how many of your photoreceptors cells are activated by the period at the end of this sentence. How could you possibly understand the sentence's meaning?
>To the degree they are limited, it is for other reasons. Resources such as computing, parameter number, lack of representative data, ...
This is where the other claim is being made. That the structure of the model is fundamentally incapable of the operation, so even if you stipulated that the way you provide data is sufficient for intelligence then it still wouldn't work.
The universal approximation theorem addresses this point. In that, with an identity attention mechanism, a LLM is just a multi layer perceptron. The attention mechanism is effectively a way to get one of the benefits of a much larger fully connected layer without the massive cost.
A LLM can do what a MLP can do. A large enough MLP can do any function to arbitrary precision.
That makes the claim that an LLM could not do a task the same as saying no function can do that task.
Some are ok with this, if you invoke some supernatual aspect to intelligence then the inability to describe it with a function is quite reasonable,
If you want to stay in the world of reality, you have a much harder task, people like to point at quantum (Penrose) but it's hard to say what it is you are pointing at.
I think the very act of proving that something is or is not intelligent, would render it functional by nature of it having a proof, (or disprove Gödel's incompleteness (a tough ask))
Are there any proofs that cannot be expressed as a function? A kind of Gödel locator, where you can prove something that you can identify is true but there is no formula to express it. I'm not entirely sure what that would even mean,
>If a machine has to learn to understand humans to complete text, then that is what it has to do.
A language model completes text based on the overlapping patterns of the training data.
There absolutely was thinking involved… in the training data. Same as when you read a book, you engage with the thinking behind the text. The book isn’t thinking, and the author may be dead and gone, but there’s absolutely the traces of thinking in the text.
Language models produce mashups of texts they were trained on, and there’s absolutely the traces of thoughts behind those mashups.
Yeah. There are good arguments against LLM consciousness. This is not one of them.
I'm hearing a lot of bad arguments against LLM consciousness lately. Bad argumentation heralds bad outcomes.
> Bad argumentation heralds bad outcomes.
What bad outcomes do you foresee from badly arguing against LLM consciousness?
Come on, I invented parts of this technology at Google and am baffled why this is debated.
We discovered math that decodes data storage in langauge and is able to use sophisticated continuation cohorts from ALL OF HUMAN RECORDED KNOWLEDGE to respond to you in a call/response model with very good synthesis capabilities.
Its super useful, but not life or conciousness. Its a simulated echo from our collective recorded behaviors. It understands because we understood first. It replies because we wrote it first. And it sorts, organizes, synthesizes and compresses that at impressive speed now.
I have no technical expertise re. LLM’s but from my intuition I came to this same conclusion.
It’s strange many others have not eh? I think when new developments arise, ironically, this is the true measure of human intelligence - one’s ability to make sense of a thing and be closest to the truth.
Then people raised $1e12+ on claims that it's conscious. Of course everyone debates it
Right? It's a computer program. Of course it isn't conscious.
I think of it as a guessing machine
His intention is irrelevant, as is "trying to highlight a fact" as if it were the final say: all Chiang is doing here is using fancy white-collar words to argue the same argument leveled against Hinton and others regarding next-token prediction. And his audience, who have even less technical understanding, lap it all up unawares. Chiang is a writer and needs to stay his own lane, not RP as an expert; or, if he wants to do journalism on this topic then he should actually do the work and talk to more actual experts not just the ones cherrypicked for his opinion piece.
Chiang has, in fact, written on this topic before - see "The Lifecycle of Software Objects", and has speculated about sentience in AI, etc. This is not a "one-off", "I need money" type of article. I dare say he has thought about this much more than most people here.
From Wikipedia: In 2023, Chiang was named one of Time's 100 most influential people in AI.
That's the problem, he's a writer. He's not a research scientist like Hinton. If a writer uses his skills and stature to rehash a well-known argument about next-token prediction, then it is performative of his status and influence and doesn't contribute to shedding actual light on the debate/confusion.
Indeed it isn't a one-off. His last infamous article compared AIs to Xerox machine image compression. He convinces a certain type of crowd that is not technical enough to poke holes in his posturing.