← Back to context

Comment by byearthithatius

9 days ago

"There exists a generally accepted baseline definition for what crosses the threshold of intelligent behavior" not really. The whole point they are trying to make is that the capability of these models IS ALREADY muddying the definition of intelligence. We can't really test it because the distribution its learned is so vast. Hence why he have things like ARC now.

Even if its just gradient descent based distribution learning and there is no "internal system" (whatever you think that should look like) to support learning the distribution, the question is if that is more than what we are doing or if we are starting to replicate our own mechanisms of learning.

Peoples’ memories are so short. Ten years ago the “well accepted definition of intelligence” was whether something could pass the Turing test. Now that goalpost has been completely blown out of the water and people are scrabbling to come up with a new one that precludes LLMs.

A useful definition of intelligence needs to be measurable, based on inputs/outputs, not internal state. Otherwise you run the risk of dictating how you think intelligence should manifest, rather than what it actually is. The former is a prescription, only the latter is a true definition.

  • I frequently see this characterization and can't agree with it. If I say "well I suppose you'd at least need to do A to qualify" and then later say "huh I guess A wasn't sufficient, looks like you'll also need B" that is not shifting the goalposts.

    At worst it's an incomplete and ad hoc specification.

    More realistically it was never more than an educated guess to begin with, about something that didn't exist at the time, still doesn't appear to exist, is highly subjective, lacks a single broadly accepted rigorous definition to this very day, and ultimately boils down to "I'll know it when I see it".

    I'll know it when I see it, and I still haven't seen it. QED

    • > If I say "well I suppose you'd at least need to do A to qualify" and then later say "huh I guess A wasn't sufficient, looks like you'll also need B" that is not shifting the goalposts.

      I dunno, that seems like a pretty good distillation of what moving the goalposts is.

      > I’ll know it when I see it, and I haven’t seen it. QED

      While pithily put, thats not a compelling argument. You feel that LLMs are not intelligent. I feel that they may be intelligent. Without a decent definition of what intelligence is, the entire argument is silly.

      2 replies →

  • LLM’s can’t pass an unrestricted Touring test. LLM’s can mimic intelligence, but if you actually try and exploit their limitations the deception is still trivial to unmask.

    Various chat bots have long been able to pass more limited versions of a Touring test. The most extreme constraint allows for simply replaying a canned conversation which with a helpful human assistant makes it indistinguishable from a human. But exploiting limitations on a testing format doesn’t have anything to do with testing for intelligence.

  • I’ve realized while reading these comments my opinions on LLMs being intelligent has significantly increased. Rather than argue any specific test, I believe no one can come up with a text-based intelligence test that 90% of literate adults can pass but the top LLMs fail.

    This would mean there’s no definition of intelligence you could tie to a test where humans would be intelligent but LLMs wouldn’t.

    A maybe more palatable idea is that having “intelligence” as a binary is insufficient. I think it’s more of an extremely skewed distribution. With how humans are above the rest, you didn’t have to nail the cutoff point to get us on one side and everything else on the other. Maybe chimpanzees and dolphins slip in. But now, the LLMs are much closer to humans. That line is harder to draw. Actually not possible to draw it so people are on one side and LLMs on the other.

    • Why presuppose that it's possible to test intelligence via text? Most humans have been illiterate for most of human history.

      I don't mean to claim that it isn't possible, just that I'm not clear why we should assume that it is or that there would be an obvious way of going about it.

      3 replies →

How does an LLM muddy the definition of intelligence any more than a database or search engine does? They are lossy databases with a natural language interface, nothing more.

  • Ah, but what is in the database? At this point it's clearly not just facts, but problem-solving strategies and an execution engine. A database of problem-solving strategies which you can query with a natural language description of your problem and it returns an answer to your problem... well... sounds like intelligence to me.

  • datasets and search engines are deterministic. humans, and llms are not.

    • LLMs are completely deterministic. Their fundamental output is a vector representing a probability distribution of the next token given the model weights and context. Given the same inputs an identical output vector will be produced 100% of the time.

      This fact is relied upon by for example https://bellard.org/ts_zip/ a lossless compression system that would not work if LLMs were nondeterministic.

      In practice most LLM systems use this distribution (along with a “temperature” multiplier) to make a weighted random choice among the tokens, giving the illusion of nondeterminism. But there’s no fundamental reason you couldn’t for example always choose the most likely token, yielding totally deterministic output.

      This is an excellent and accessible series going over how transformer systems work if you want to learn more. https://youtu.be/wjZofJX0v4M

      2 replies →

    • The LLM's output is chaotic relative to the input, but it's deterministic right? Same settings, same model, same input, .. same output? Where does the chain get broken here?

      4 replies →

    • The only reason LLMs are stochastic instead of deterministic is a random number generator. There is nothing inherently non-deterministic about LLM algorithms unless you turn up the "temperature" of selecting the next word. The fact that determinism can be changed by turning a knob is clear evidence that they are closer to a database or search engine than a human.

      2 replies →