Comment by MyFirstSass

2 days ago

Based on Tao’s description of how the proof came about - a human is taking results backwards and forwards between two separate AI tools and using an AI tool to fill in gaps the human found?

I don’t think it can really be said to have occurred autonomously then?

Looks more like a 50/50 partnership with a super expert human one the one side which makes this way more vague in my opinion - and in line with my own AI tests, ie. they are pretty stupid even OPUS 4.5 or whatever unless you're already an expert and is doing boilerplate.

EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.

You're understanding correctly, this is back and forth between Aristotle and ChatGPT and a (very smart) user.

  • I'm not sure i understand the wild hype here in this thread then.

    Seems exactly like the tests at my company where even frontier models are revealed to be very expensive rubber ducks, but completely fails with non experts or anything novel or math heavy.

    Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.

    • Yes, the contributions of the people promoting the AI should be considered, as well as the people who designed the Lean libraries used in-the-loop while the AI was writing the solution. Any talk of "AGI" is, as always, ridiculous.

      But speaking as a specialist in theorem proving, this result is pretty impressive! It would have likely taken me a lot longer to formalize this result even if it was in my area of specialty.

      26 replies →

    • This accurately mirrors my experience. It never - so far - has happened that the AI brought any novel insight at the level that I would see as an original idea. Presumably the case of TFA is different but the normal interaction is that that the solution to whatever you are trying to solve is a millimeter away from your understanding and the AI won't bridge that gap until you do it yourself and then it will usually prove to you that was obvious. If it was so obvious then it probably should have made the suggestion...

      Recent case:

      I have a bar with a number of weights supported on either end:

      |---+-+-//-+-+---|

      What order and/or arrangement or of removing the weights would cause the least shift in center-of-mass? There is a non-obvious trick that you can pull here to reduce the shift considerably and I was curious if the AI would spot it or not but even after lots of prompting it just circled around the obvious solutions rather than to make a leap outside of that box and come up with a solution that is better in every case.

      I wonder what the cause of that kind of blindness is.

      11 replies →

    • In other words, LLMs work best when *you are absolutely right" and "this is a very insightful question" are actually true.

    • Lots of users seem to think LLM's think and reason so this sounds wonderful. A mechanical process isn't thinking, certainly it does NOT mirror human thinking. The processes being altogether different.

    • Do you have any idea how many people here have paychecks that depend on the hype, or hope to be in that position? They were the same way for Crypto until it stopped being part of the get-rich-quick dream.

    • > Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.

      This hits so true to home. Just today in my field a manager without expertise in a topic gave me an AI solution to something I am an expertise in. The AI was very plainly and painfully wrong, but it comes down to the user prompting really poorly. When I gave a el formulated prompt to the same topic, I got the correct answer on the first go.

    • "the more interesting capability revealed by these events is the ability to rapidly write and rewrite new versions of a text as needed, even if one was not the original author of the argument." From the Tao thread. The ability to quickly iterate on research is a big change because "This is sharp contrast to existing practice where....large-scale reworking of the paper often avoided due both to the work required and the large possibility of introducing new errors."

  • Exactly "The Geordi LaForge Paradox" of "AI" systems. The most sophisticated work requires the most sophisticated user, who can only become sophisticated the usual way --- long hard work, trial and error, full-contact kumite with reality, and a degree of devotion to the field.

  • https://www.erdosproblems.com/forum/thread/728#post-2808

    > There seems to be some confusion on this so let me clear this up. No, after the model gave its original response, I then proceeded to ask it if it could solve the problem with C=k/logN arbitrarily large. It then identified for itself what both I and Tao noticed about it throwing away k!, and subsequently repaired its proof. I did not need to provide that observation.

    so it was literally "yo, your proof is weak!" - "naah, watch this! [proceeds to give full proof all on its own]"

    I'd say that counts

I had the impression Tao/community weren't even finding the gaps, since they mentioned using an automatic proof verifier. And that the main back and forth involved re-reading Erdos' paper to find out the right problem Erdos intended. So more like 90/10 LLM/human. Maybe I misread it.

This website was made by Thomas Bloom, a mathematician who likes to think about the problems Erdős posed. Technical assistance with setting up the code for the website was provided by ChatGPT -from the FAQ

> EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.

"solved more or less autonomously by AI" were Tao's exact words, so I think we can trust his judgment about how much work he or the AI did, and how this indicates a meaningful increase in capabilities.