Comment by n4r9
6 days ago
> if they are going to be mass deployed in society
This is the crucial point. The vision is massive scale usage of agents that have capabilities far beyond humans, but whose edge case behaviours are often more difficult to predict. "Humans would also get this wrong sometimes" is not compelling.
It's also off-the-charts implausible to say that our performance on adding up substantially degrades with the introduction of irrelevant information. Almost all cases of our use of arithmetic in daily life come with vast amounts of irrelevant information.
Any person who looked at a restaurant table and couldn't review the bill because there were kid's drawings of cats on it would be severely mentally disabled, and never employed in any situation which required reliable arithmetic skills.
I cannot understand this ever more absurd levels of denying the most obvious, common-place, basic capabilities that the vast majority of people have and use regularly in their daily lives. It should be a wake-up call to anyone professing this view that they're off the deep-end in copium.
> It's also off-the-charts implausible to say that our performance on adding up substantially degrades with the introduction of irrelevant information
Didn't you ever sit an exam next to a irresistibly gorgeous girl? Or haven't you ever gone to work in the middle of a personal crisis? Or filled out a form while people were rowing in your office? Or written code with a pneumatic drill and banging outdoors?
That's the kind of irrelevant information in our working context that will often degrade human performance. Can you really argue noise in a prompt is any different?
"Intelligence" is a metaphor used to describe LLMs (, AI) used by those who have never studied intelligence.
If you had studied intelligence as a science of systems which are intelligent (ie., animals, people, etc.) then this comparison would seem absurd to you; mendacious and designed to confound.
The desperation to find some scenario in which, at the most extreme superficial level, an intelligent agent "benchmarks like an LLM" is a pathology of thinking designed to lure the gullible into credulousness.
If an LLM is said to benchmark on arithmetic like a person doing math whilst being tortured, then the LLM cannot do math -- just as a person being tortured cannot. I cannot begin to think what this is supposed to show.
LLMs, and all statistical learners based on interpolating historical data, have a dramatic sensitivity to permuting their inputs such that they collapse in performance. A small permutation to the input is, if we must analogise, "like toturing a person to the point their mind ceases to function". Because these learners do not have representations of the underlying problem domain which are fit to the "natural, composable, general" structures of that domain ---- they are just fragmaents of text data put in a blender. You'll get performance only when that blender isnt being nudged.
The reason one needs to harm a person to a point they are profoundly disabled and cannot think, to get this kind of performance -- is that at this point, a person cannot be said to be using their mind at all.
This is why the analogy holds in a very superficial way: because LLMs do not analogise to functioning minds; they are not minds at all.
3 replies →