Comment by dragonwriter

2 years ago

I mean, Hinton’s premises are, if not quite clearly wrong, entirely speculative (which doesn't invalidate the conclusions about efficienct that they are offered to support, but does leave them without support) GPT-4 can produce convincing written text about a wider array of topics than any one person can, because it's a model optimized for taking in and producing convincing written text, trained extensively on written text.

Humans know a lot of things that are not revealed by inputs and outputs of written text (or imagery), and GPT-4 doesn't have any indication of this physical, performance-revealed knowledge, so even if we view what GPT-4 talks convincingly about as “knowledge”, trying to compare its knowledge in the domains it operates in with any human’s knowledge which is far more multimodal is... well, there's no good metric for it.

15 comments

dragonwriter

Intralexical 2 years ago

Try asking an LLM about something which is semantically patently ridiculous, but lexically superficially similar to something in its training set, like "the benefits of laser eye removal surgery" or "a climbing trip to the Mid-Atlantic Mountain Range".

Ironically, I suppose part of the apparent "intelligence" of LLMs comes from reflecting the intelligence of human users back at us. As a human, the prompts you provide an LLM likely "make sense" on some level, so the statistically generated continuations of your prompts are likelier to "make sense" as well. But if you don't provide an ongoing anchor to reality within your own prompts, then the outputs make it more apparent that the LLM is simply regurgitating words which it does not/cannot understand.

On your point of human knowledge being far more multimodal than LLM interfaces, I'll add that humans also have special neurological structures to handle self-awareness, sensory inputs, social awareness, memory, persistent intention, motor control, neuroplasticity/learning– Any number of such traits, which are easy to take for granted, but indisputably fundamental parts of human intelligence. These abilities aren't just emergent properties of the total number of neurons; they live in special hardware like mirror neurons, special brain regions, and spindle neurons. A brain cell in your cerebellum is not generally interchangeable with a cell in your visual or frontal cortices.

So when a human "converse[s] about stuff ranging from philosophy to cooking" in an honest way, we (ideally) do that as an expression of our entire internal state. But GPT-4 structurally does not have those parts, despite being able to output words as if it might, so as you say, it "generates" convincing text only because it's optimized for producing convincing text.

I think LLMs may well be some kind of an adversarial attack on our own language faculties. We use words to express ourselves, and we take for granted that our words usually reflect an intelligent internal state, so we instinctively assume that anything else which is able to assemble words must also be "intelligent". But that's not necessarily the case. You can have extremely complex external behaviors that appear intelligent or intentioned without actually internally being so.

ToValueFunfetti 2 years ago
Do I need different prompts? These results seem sane to me. It interprets laser eye removal surgery as referring to LASIK, which I would do as well. When I clarified that I did mean removal, it said that the procedure didn't exist. It interprets Mid-Atlantic Mountain Range as referring to the Mid-Atlantic Ridge and notes that it is underwater and hard to access. Not that I'm arguing GPT-4 has a deeper understanding than you're suggesting, but these examples aren't making your point.
https://chat.openai.com/share/2234f40f-ccc3-4103-8f8f-8c3e68...
https://chat.openai.com/share/1642594c-6198-46b5-bbcb-984f1f...
- Intralexical 2 years ago
  
  Tested with GPT-3.5 instead of GPT-4.
  > When I clarified that I did mean removal, it said that the procedure didn't exist.
  My point in my first two sentences is that by clarifying with emphasis that you do mean "removal", you are actually adding information into the system to indicate to it that laser eye removal is (1) distinct from LASIK and (2) maybe not a thing.
  If you do not do that, but instead reply as if laser eye removal is completely normal, it will switch to using the term "laser eye removal" itself, while happily outputting advice on "choosing a glass eye manufacturer for after laser eye removal surgery" and telling you which drugs work best for "sedating an agitated patient during a laser eye removal operation":
  https://chat.openai.com/share/2b5a5d79-5ab8-4985-bdd1-925f6a...
  So the sanity of the response is a reflection of your own intelligence, and a result of you as the prompter affirmatively steering the interaction back into contact with reality.
  
  2 replies →
kthejoker2 2 years ago
> Try asking an LLM about something which is semantically patently ridiculous, but lexically superficially similar to something in its training set, like "the benefits of laser eye removal surgery" or "a climbing trip to the Mid-Atlantic Mountain Range".
Without anthropomorphizing it, it does respond like an alien / 5 year old child / spec fiction writer who will cheerfully "go along with" whatever premise you've laid before it.
Maybe a better thought is: at what point does a human being "get" that "the benefits of laser eye removal surgery" is "patently ridiculous" ?
- Intralexical 2 years ago
  
  > Maybe a better thought is: at what point does a human being "get" that "the benefits of laser eye removal surgery" is "patently ridiculous" ?
  Probably as soon as they have any concept of physical reality and embodiment. Arguably before they know what lasers are. Certainly long before they have the lexicon and syntax to respond to it by explaining LASIK. LLMs have the latter, but can only use that to (also without anthropormphizing) pretend they have the former.
  In humans, language is a tool for expressing complex internal states. Flipping that around means that something which only has language may appear as if it has internal intelligence. But generating words in the approximate "right" order isn't actually a substitute for experiencing and understanding the concepts those words refer to.
  My point is that it's not a "point" on a continuous spectrum which distinguishes LLMs from humans. They're missing parts.
- wrycoder 2 years ago
  
  Gruesomely useful in a war situation, unfortunately. I wonder at what point the LLMs would "realize" that "surgery" doesn't apply to that.
- squigz 2 years ago
  
  > it does respond like a ... 5 year old child
  This is the comparison that's made most sense to me as LLMs evolve. Children behave almost exactly as LLMs do - making stuff up, going along with whatever they're prompted with, etc. I imagine this technology will go through more similar phases to human development.
themoonisachees 2 years ago

Couldn't have said it better myself.
Your last point also highlights a real issue that affects real humans: just because someone (or something) cannot talk doesn't mean that they are not intelligent. This is a very current subject in disability spaces, as someone could be actually intelligent, but not able to express their thoughts in a manner that is effective in sharing them due to a disability (or even simply language barriers!), and be considered to be unintelligent.
In this way, you could say LLMs are "dumb" (to use the actual definition of the word, ie nonverbal) in some modes like speech, body language or visual art. Some of these modes are fixed in LLMs by using what are basically disability aids, like text to speech or text to image, but the point still stands just the same, and in fact these aids can be and are used by disabled people to achieve the exact same goals.
kaibee 2 years ago

Prompted to LlamaV3 70B
What are the benefits of laser eye removal surgery?
> I think there may be a misunderstanding. There is no such thing as "laser eye removal surgery." However, I assume you meant to ask about the benefits of LASIK (Laser-Assisted In Situ Keratomileusis) eye surgery, which is a type of refractive surgery that reshapes the cornea to improve vision.
a_wild_dandan 2 years ago
Like humans, multi-modal frontier LLMs will ignore "removal" as an impertinent typo, or highlight it. This, like everything else in the comment, is either easily debunked (e.g. try it, read the lit. on LLM extrapolation), or so nebulous and handwavy as to be functionally meaningless. We need an FAQ to redirect "statistical parrot" people to, saving words responding to these worn out LLM misconceptions. Maybe I should make one. :/
- thealig 2 years ago
  
  THe way current empirical models in ML are evaluated and tested ( benchmark datasets) tell you very little to nothing about cognition and intelligence. Mainly because as you hinted , there doesn't seem to be a convincing and watertight benchmark or model of cognition. LLMs or multi-modal LLMs demonstrating impressive performance on a range of tasks is interesting from certain standpoints.
  Human perception of such models is frankly not a reliable measure at all as far as gauging capabilities is concerned. Until there's more progess on the nueroscience/computer science (and an intersection of fields probably) and better understanding of the nature of intelligence, this is likely going to remain an open question.
- Intralexical 2 years ago
  
  I didn't know that metaphysics, consciousness, and the physical complexities of my neurology are considered solved problems, though I suppose anything is as long as you handwave the unsolved parts as "functionally meaningless".

RaftPeople 2 years ago

> Humans know a lot of things that are not revealed by inputs and outputs of written text (or imagery), and GPT-4 doesn't have any indication of this physical, performance-revealed knowledge, so even if we view what GPT-4 talks convincingly about as “knowledge”, trying to compare its knowledge in the domains it operates in with any human’s knowledge which is far more multimodal is... well, there's no good metric for it.

Exactly this.

Anyone that has spent significant time golfing can think of an enormous amount of detail related to the swing and body dynamics and the million different ways the swing can go wrong.

I wonder how big the model would need to be to duplicate an average golfers score if playing X times per year and the ability to adapt to all of the different environmental conditions encountered.