Comment by merlincorey
1 day ago
Which ones are you claiming have already been achieved?
My understanding of the current scorecard is that he's still technically correct, though I agree with you there is velocity heading towards some of these things being proven wrong by 2029.
For example, in the recent thread about LLMs and solving an Erdos problem I remember reading in the comments that it was confirmed there were multiple LLMs involved as well as an expert mathematician who was deciding what context to shuttle between them and helping formulate things.
Similarly, I've not yet heard of any non-expert Software Engineers creating 10,000+ lines of non-glue code that is bug-free. Even expert Engineers at Cloud Flare failed to create a bug-free OAuth library with Claude at the helm because some things are just extremely difficult to create without bugs even with experts in the loop.
The bug-free code one feels unfalsifiable to me. How do you prove that 10,000 lines of code is bug-free, and then there's a million caveats about what a bug actually is and how we define one.
The second claim about novels seems obviously achieved to me. I just pasted a random obscure novel from project gutenberg into a file and asked claude questions about the characters, and then asked about the motivations of a random side-character. It gave a good answer, I'd recommend trying it yourself.
I agree with you but I'd point out that unless you've read the book it's difficult to know if the answer you got was accurate or it just kinda made it up. In my experience it makes stuff up.
Like, it behaves as if any answer is better than no answer.
So do humans asked to answer tests. The appropriate thing is to compare to human performance at the same task.
At most of these comprehension tasks, AI is already superhuman (in part because Gary picked scaled tasks that humans are surprisingly bad at).
1 reply →
I've done similar loose experiments using a 19th century French novel I have read (L'Agonie by Jean Lombard), and which is quite obscure. There is no English translation of the novel published.
Using Gemini Notebooks I've found it passable at summarizing chapters, listing characters, major theme and so on, and it can do this in English.
"What is the symbolism of the Black Stone?"
> In the sources, the Black Stone (referred to as the Pierre-Noire) serves as the central religious icon of the Syrian Sun cult brought to Rome by the Emperor Elagabalus. Its symbolism is multifaceted, ranging from fertility and virility to a mystical pursuit of universal unity.
> It represents the perennity of the Sun, which fertilizes the world, causing "germs to rise" and spreading them through the atmosphere. It is viewed as the "definitive form" of divinity, intended to absorb and replace the "transitory forms" of all other Roman, Greek, Egyptian, and Persian gods, including the Christian "Kreistos". > Virility and Phallic Symbolism > > The sources explicitly characterize the Black Stone as a gigantic sacred phallus. It is described as: • An icon of virility and the "organ of generation" • A "unisexual icon" that materializes the generative force of nature • A representation of "virility in activity," which is why it is often paraded and elevated in an "orgasmic" or "colossal adoration"
> The Androgyne and Universal Unity - Beyond simple fertility, the philosopher Atillius explains a more complex, "mad" metaphysical project associated with the stone. It symbolizes "Life One" (Vie Une) and the return to a unisexual state of perfection. • The Androgyne: Atillius believes that by pursuing "the male sex by the male sex," the cult "inutilizes" the female sex to eventually create the Androgyne—a self-sufficient being containing both sexes • Unity: The stone signifies the fusion of all generative forces into a single Unity, reversing the "separation of the sexes" which is viewed as a state of unhappiness and impotence. • Marriage of Moon and Sun: The ritual marriage of the goddess Astaroth (representing the Moon and the female principle) to the Black Stone (representing the Sun and the male principle) symbolizes the merging of the Orient and Occident into this unified life principle > > Destruction of the Symbol - The Black Stone ultimately becomes a symbol of Oriental pollution and decadence to the Roman populace. During the final rebellion against Elagabalus, the stone is torn from its temple on the Palatine, defiled with filth, and broken into pieces to ensure that its "signification of Life" would never again dominate Rome.
This is all accurate to the book, even teasing out a couple themes that were only subconsciously present to me.
The NotebookLM version gives citations with links to the original text to support all these assertions, which largely are coherent with that purpose.
The input is raw images of a book scan! Imperfect as it is it still blows my mind. Not that long ago any kind of semantic search or analysis was a very hard AI problem.
"quite obscure" doesn't mean there is nothing in the internet that directly addresses the question.
Here is an english analysis of the text that easily showed up in an internet search:
https://www.cantab.net/users/leonardo/Downloads/Varian%20Sym...
This source includes analysis of "the Black Stone."
5 replies →
1 and 2 have been achieved.
4 is close, the interface needs some work to allow nontechnical people use it. (claude code)
I strongly disagree. I’ve yet to find an AI that can reliably summarise emails, let alone understand nuance or sarcasm. And I just asked ChatGPT 5.2 to describe an Instagram image. It didn’t even get the easily OCR-able text correct. Plus it completely failed to mention anything sports or stadium related. But it was looking at a cliche baseball photo taken by an fan inside the stadium.
I have had ChatGPT read text in an image, give me a 100% accurate result, and then claim not to have the ability and to have guessed the previous result when I ask it to do it again.
>let alone understand nuance or sarcasm
I'm still trying to find humans that do this reliably too.
To add on, 5.2 seems to be kind of lazy when reading text in images by default. Feeding it an image it may give the first word or so. But coming back with a prompt 'read all the text in the image' makes it do a better job.
With one in particular that I tested I thought it was hallucinating some of the words, but there was a picture in the picture with small words it saw I missed the first time.
I think a lot of AI capabilities are kind of munged to end users because they limit how much GPU is used.
1 reply →
I dispute 1 & 2 more than 4.
1) Is it actually watching a movie frame by frame or just searching about it and then giving you the answer?
2) Again can it handle very long novels, context windows are limited and it can easily miss something. Where is the proof for this?
4 is probably solved
4) This is more on predictor because this is easy to game. you can create some gibberish code with LLM today that is 10k lines long without issues. Even a non-technical user can do
I think all of those are terrible indicators, 1 and 2 for example only measure how well LLMs can handle long context sizes.
If a movie or novel is famous the training data is already full of commentary and interpretations of them.
If its something not in the training data, well I don't know many movies or books that use only motives that no other piece of content before them used, so interpreting based on what is similar in the training data still produces good results.
EDIT: With 1 I meant using a transcript of the Audio Description of the movie. If he really meant watch a movie I'd say thats even sillier because well of course we could get another Agent to first generate the Audio Description, which definitely is possible currently.
2 replies →