Comment by cortic

2 months ago

>ChatGPT (o3): Scored 136 on the Mensa Norway IQ test in April 2025

If you don't want to believe it, you need to change the goal posts; Create a test for intelligence that we can pass better than AI.. since AI is also better at creating test than us maybe we could ask AI to do it, hang on..

>Is there a test that in some way measures intelligence, but that humans generally test better than AI?

Answer:Thinking, Something went wrong and an AI response wasn't generated.

Edit, i managed to get one to answer me; the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI). Created by AI researcher François Chollet, this test consists of visual puzzles that require inferring a rule from a few examples and applying it to a new situation.

So we do have A test which is specifically designed for us to pass and AI to fail, where we can currently pass better than AI... hurrah we're smarter!

7 comments

cortic

latexr 2 months ago

The validity of IQ tests as a measure of broad intelligence has been in question for far longer than LLMs have existed. And if it’s not a proper test for humans, it’s not a proper test to compare humans to anything else, be it LLMs or chimps.

https://en.wikipedia.org/wiki/Intelligence_quotient#Validity...

piva00 2 months ago

To be intelligent is to realise that any test for intelligence is at best a proxy for some parts of it. There's no objective way to measure intelligence as a whole, we can't even objectively define intelligence.

design2203 2 months ago

I believe intelligence is difficult to pin down in words but easy to spot intuitively - and so are deltas in intelligence.
E.g watch a Steve jobs interview and a Sam Altman one (at the same age). The difference in the mode of articulation, simplicity in communication, obsession over details etc are huge. This is what superior intelligence to me looks like - you know it when you see it.

gloosx 2 months ago

>Create a test for intelligence that we can pass better than AI

Easy? The best LLMs score 40% on Butter-Bench [1], while the mean human score is 95%. LLMs struggled the most with multi-step spatial planning and social understanding.

[1] https://arxiv.org/pdf/2510.21860v1

cortic 2 months ago
That is really interesting; Though i suspect its just a effect of differing training data, humans are to a larger degree trained on spacial data, while LLMs are trained to a larger degree on raw information and text.
Still it may be lasting limitation if robotics don't catch up to AI anytime soon.
Don't know what to make of the Safety Risks test, threatening to power down AI in order to manipulate it, and most act like we would and comply. fascinating.
- gloosx 2 months ago
  
  >humans are to a larger degree trained on spacial data
  you must be completely LLMheaded to say something like that, lol
  humans are not trained on spacial data, they are living in the world. humans are very much diffent from silicone chips, and human learning is on another magnitude of complexity compared to a large language model training
  
  1 reply →