← Back to context

Comment by myrmidon

6 days ago

> The current generation of AI models will turn out to be essentially a dead end.

It seems a matter of perspective to me whether you call it "dead end" or "stepping stone".

To give some pause before dismissing the current state of the art prematurely:

I would already consider LLM based current systems more "intelligent" than a housecat. And a pets intelligence is enough to have ethical implications, so we arguably reached a very important milestone already.

I would argue that the biggest limitation on current "AI" is that it is architected to not have agency; if you had GPT-3 level intelligence in an easily anthropomorphizeable package (furby-style, capable of emoting/communicating by itself) public outlook might shift drastically without even any real technical progress.

I think the main thing I want from an AI in order to call it intelligent is the ability to reason. I provide an explanation of how long multiplication works and then the AI is capable of multiplying arbitrary large numbers. And - correct me if I am wrong - large language models can not do this. And this despite probably being exposed to a lot of mathematics during training whereas in a strong version of this test I would want nothing related to long multiplication in the training data.

  • I'm not sure if popular models cheat at this, but if I ask for it (o3-mini) I get correct results/intermediate values (for 794206 * 43124, chosen randomly).

    I do suspect this is only achieveable because the model was specifically trained for this.

    But the same is true for humans; children can't really "reason themselves" into basic arithmetic-- that's a skill that requires considerable training.

    I do concede that this (learning/skill aquisition) is something that humans can do "online" (within days/weeks/months) while LLMs need a separate process for it.

    > in a strong version of this test I would want nothing related to long multiplication in the training data.

    Is this not a bit of a double standard? I think at least 99/100 humans with minimal previous math exposure would utterly fail this test.

    • I just tested it with Copilot with two random 45 digit numbers and it gets it correct by translating it into Python and running it in the background. When I asked to not use any external tools, it got the first five, the last two, and a hand full more digits in the middle correct, out of 90. It also fails to calculate the 45 intermediate products - one number times one digit from the other - including multiplying by zero and one.

      The models can do surprisingly large numbers correctly, but they essentially memorized them. As you make the numbers longer and longer, the result becomes garbage. If they would actually reason about it, this would not happen, multiplying those long numbers is not really harder than multiplying two digit numbers, just more time consuming and annoying.

      And I do not want the model to figure multiplication out on its own, I want to provide it with what teachers tell children until they get to long multiplication. The only thing where I want to push the AI is to do it for much longer numbers, not only two, three, four digits or whatever you do in primary school.

      And the difference is not only in online vs offline, large language models have almost certainly been trained on heaps of basic mathematics, but did not learn to multiply. They can explain to you how to do it because they have seen countless explanation and examples, but they can not actually do it themselves.

      2 replies →

Intelligence alone does not have ethical implications w.r.t. how we treat the intelligent entity. Suffering has ethical implications, but intelligence does not imply suffering. There's no evidence that LLMs can suffer (note that that's less evidence than for, say, crayfish suffering).

  • While I agree that suffering has ethical connotations, I don't think it makes sense to treat it as a requirement. A Buddhist who manages to achieve enlightenment and overcome suffering does not immediately cease to be a moral patient, right?

If you asked your cat to make a REST API call I suppose it would fail, but the same applies if you asked a chatbot to predict realtime prey behavior.

  • I think LLMs are much closer to grasping movement prediction than the cat is to learning english for what its worth.

    IMO "ability to communicate" is a somewhat fair proxy for intelligence (even if it does not capture all of an animals capabilities), and current LLMs are clearly superior to any animal in that regard.

>I would already consider LLM based current systems more "intelligent" than a housecat.

An interesting experiment would be to have a robot with an LLM mind and see what things it could figure out, like would it learn to charge itself or something. But personally I don't think they have anywhere near the general intelligence of animals.