Comment by alyxya

4 days ago

The impactful innovations in AI these days aren't really from scaling models to be larger. It's more concrete to show higher benchmark scores, and this implies higher intelligence, but this higher intelligence doesn't necessarily translate to all users feeling like the model has significantly improved for their use case. Models sometimes still struggle with simple questions like counting letters in a word, and most people don't have a use case of a model needing phd level research ability.

Research now matters more than scaling when research can fix limitations that scaling alone can't. I'd also argue that we're in the age of product where the integration of product and models play a major role in what they can do combined.

> this implies higher intelligence

Not necessarily. The problem is that we can't precisely define intelligence (or, at least, haven't so far), and we certainly can't (yet?) measure it directly. And so what we have are certain tests whose scores, we believe, are correlated with that vague thing we call intelligence in humans. Except these test scores can correlate with intelligence (whatever it is) in humans and at the same time correlate with something that's not intelligence in machines. So a high score may well imply high intellignce in humans but not in machines (e.g. perhaps because machine models may overfit more than a human brain does, and so an intelligence test designed for humans doesn't necessarily measure the same thing we think of when we say "intelligence" when applied to a machine).

This is like the following situation: Imagine we have some type of signal, and the only process we know produces that type of signal is process A. Process A always produces signals that contain a maximal frequency of X Hz. We devise a test for classifying signals of that type that is based on sampling them at a frequency of 2X Hz. Then we discover some process B that produces a similar type of signal, and we apply the same test to classify its signals in a similar way. Only, process B can produce signals containing a maximal frequency of 10X Hz and so our test is not suitable for classifying the signals produced by process B (we'll need a different test that samples at 20X Hz).

  • My definition of intelligence is the capability to process and formalize a deterministic action from given inputs as transferable entity/medium. In other words knowing how to manipulate the world directly and indirectly via deterministic actions and known inputs and teach others via various mediums. As example, you can be very intelligent at software programming, but socially very dumb (for example unable to socially influence others).

    As example, if you do not understand another person (in language) and neither understand the person's work or it's influence, then you would have no assumption on the person's intelligence outside of your context what you assume how smart humans are.

    ML/AI for text inputs is stochastic at best for context windows with language or plain wrong, so it does not satisfy the definition. Well (formally) specified with smaller scope tend to work well from what I've seen so far. Known to me working ML/AI problems are calibration/optimization problems.

    What is your definition?

    • Forming deterministic actions is a sign of computation, not intelligence. Intelligence is probably (I guess) dependent on the nondeterministic actions.

      Computation is when you query a standby, doing nothing, machine and it computes a deterministic answer. Intelligence (or at least some sign of it) is when machine queries you, the operator, on it's own volition.

      3 replies →

    • > My definition of intelligence is the capability to process and formalize a deterministic action from given inputs as transferable entity/medium.

      I don't think that's a good definition because many deterministic processes - including those at the core of important problems, such as those pertaining to the economy - are highly non-linear and we don't necessarily think that "more intelligence" is what's needed to simulate them better. I mean, we've proven that predicting certain things (even those that require nothing but deduction) require more computational resources regardless of the algorithm used for the prediction. Formalising a process, i.e. inferring the rules from observation through induction, may also be dependent on available computational resources.

      > What is your definition?

      I don't have one except for "an overall quality of the mental processes humans present more than other animals".

      1 reply →

  • Fair, I think it would be more appropriate to say higher capacity.

    • Ok, but the point of a test of this kind is to generalise its result. I.e. the whole point of an intelligence test is that we believe that a human getting a high score on such a test is more likely to do some useful things not on the test better than a human with a low score. But if the problem is that the test results - as you said - don't generalise as we expect them, then the tests are not very meaningful to begin with. If we don't know what to expect from a machine with a high test score when it comes to doing things not on the test, then the only "capacity" we're measuring is the capacity to do well on such tests, and that's not very useful.

> this implies higher intelligence

Models aren't intelligent, the intelligence is latent in the text (etc) that the model ingests. There is no concrete definition of intelligence, only that humans have it (in varying degrees).

The best you can really state is that a model extracts/reveals/harnesses more intelligence from its training data.

  • > There is no concrete definition of intelligence

    Note that if this is true (and it is!) all the other statements about intelligence and where it is and isn’t found in the post (and elsewhere) are meaningless.

    • I did notice that, the person you replied to made a categorical statement about intelligence followed immediately with negating that there is anything to make a concrete statement about.

"Scaling" is going to eventually apply to the ability to run more and higher fidelity simulations such that AI can run experiments and gather data about the world as fast and as accurately as possible. Pre-training is mostly dead. The corresponding compute spend will be orders of magnitude higher.

  • That's true, I expect more inference time scaling and hybrid inference/training time scaling when there's continual learning rather than scaling model size or pretraining compute.

    • Simulation scaling will be the most insane though. Simulating "everything" at the quantum level is impossible and the vast majority of new learning won't require anything near that. But answers to the hardest questions will require as close to it as possible so it will be tried. Millions upon millions of times. It's hard to imagine.

  • >Pre-training is mostly dead.

    I don't think so. Serious attempts for producing data specifically for training have not being achieved yet. High quality data I mean, produced by anarcho-capitalists, not corporations like Scale AI using workers, governed by laws of a nation etc etc.

    Don't underestimate the determination of 1 million young people to produce within 24 hours perfect data, to train a model to vacuum clean their house, if they don't have to do it themselves ever again, and maybe earn some little money on the side by creating the data.

    The other part of the comment I agree.

> most people don't have a use case of a model needing phd level research ability.

Models also struggle at not fabricating references or entire branches of science.

edit: "needing phd level research ability [to create]"?

Counting letters is tricky for LLMs because they operate on tokens, not letters. From the perspective of a LLM, if you ask it "this is a sentence, count the letters in it" it doesn't see a stream of characters like we do, it sees [851, 382, 261, 21872, 11, 3605, 290, 18151, 306, 480].