Comment by D-Machine

1 month ago

You're understanding correctly, this is back and forth between Aristotle and ChatGPT and a (very smart) user.

53 comments

D-Machine

I'm not sure i understand the wild hype here in this thread then.

Seems exactly like the tests at my company where even frontier models are revealed to be very expensive rubber ducks, but completely fails with non experts or anything novel or math heavy.

Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.

markusde 1 month ago
Yes, the contributions of the people promoting the AI should be considered, as well as the people who designed the Lean libraries used in-the-loop while the AI was writing the solution. Any talk of "AGI" is, as always, ridiculous.
But speaking as a specialist in theorem proving, this result is pretty impressive! It would have likely taken me a lot longer to formalize this result even if it was in my area of specialty.
- falcor84 1 month ago
  
  > Any talk of "AGI" is, as always, ridiculous.
  How did you arrive at "ridiculous"? What we're seeing here is incredible progress over what we had a year ago. Even ARC-AGI-2 is now at over 50%. Given that this sort of process is also being applied to AI development itself, it's really not clear to me that humans would be a valuable component in knowledge work for much longer.
  
  26 replies →
jacquesm 1 month ago
This accurately mirrors my experience. It never - so far - has happened that the AI brought any novel insight at the level that I would see as an original idea. Presumably the case of TFA is different but the normal interaction is that that the solution to whatever you are trying to solve is a millimeter away from your understanding and the AI won't bridge that gap until you do it yourself and then it will usually prove to you that was obvious. If it was so obvious then it probably should have made the suggestion...
Recent case:
I have a bar with a number of weights supported on either end:
|---+-+-//-+-+---|
What order and/or arrangement or of removing the weights would cause the least shift in center-of-mass? There is a non-obvious trick that you can pull here to reduce the shift considerably and I was curious if the AI would spot it or not but even after lots of prompting it just circled around the obvious solutions rather than to make a leap outside of that box and come up with a solution that is better in every case.
I wonder what the cause of that kind of blindness is.
- ogogmad 1 month ago
  
  The problem is unclear. I think you have a labelled graph G=(V, E) with labels c:V->R, such that each node in V consists of a triple (L, R, S) where L is a sequence of weights are on the left, R is a sequence of weights that are on the right, and S is a set of weight that have been taken off. Define c(L, R, S) to be the centre of mass. Introduce an undirected edge e={(L, R, S), (L', R', S')} between (L, R, S) and (L', R', S') either if (i) (L', R', S') results from taking the first weight off L and adding it to S, or (ii) (L', R', S') results from taking the first weight off R and adding it to S, or (iii) (L', R', S') results from taking a weight from W and adding it to L, or (iv) (L', R', S') results from taking a weight from W and adding it to R.
  There is a starting node (L_0, R_0, {}) and an ending node ({}, {}, W) , with the latter having L=R={}.
  I think you're trying to find the path (L_n, R_n, S_n) from the starting node to the ending node that minimises the maximum absolute value of c(L_n, R_n, S_n).
  I won't post a solution, as requested.
  
  2 replies →
- jiggawatts 1 month ago
  
  That problem is not clearly stated, so if you’re pasting that into an AI verbatim you won’t get the answer you’re looking for.
  My guess is: first move the weights to the middle, and only then remove them.
  However “weights” and “bar” might confuse both machines and people into thinking that this is related to weight lifting, where there’s two stops on the bar preventing the weights from being moved to the middle.
  
  6 replies →
- TeodorDyakov 1 month ago
  
  Tokenizationnnnnnn
krzat 1 month ago

In other words, LLMs work best when *you are absolutely right" and "this is a very insightful question" are actually true.
SecretDreams 1 month ago

> Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.
This hits so true to home. Just today in my field a manager without expertise in a topic gave me an AI solution to something I am an expertise in. The AI was very plainly and painfully wrong, but it comes down to the user prompting really poorly. When I gave a el formulated prompt to the same topic, I got the correct answer on the first go.
encyclopedism 1 month ago

Lots of users seem to think LLM's think and reason so this sounds wonderful. A mechanical process isn't thinking, certainly it does NOT mirror human thinking. The processes being altogether different.
EA-3167 1 month ago

Do you have any idea how many people here have paychecks that depend on the hype, or hope to be in that position? They were the same way for Crypto until it stopped being part of the get-rich-quick dream.
HDThoreaun 1 month ago

"the more interesting capability revealed by these events is the ability to rapidly write and rewrite new versions of a text as needed, even if one was not the original author of the argument." From the Tao thread. The ability to quickly iterate on research is a big change because "This is sharp contrast to existing practice where....large-scale reworking of the paper often avoided due both to the work required and the large possibility of introducing new errors."
Davidzheng 1 month ago
The proof is ai generated?
- MyFirstSass 1 month ago
  
  Eh? The text reads:
  "Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver"
  Not saying it's not an amazing setup, i just don't understand the word "AI" being used like this when it's the setup / system that's brilliant in conjunction with absolute experts.
  
  2 replies →
anthem2025 1 month ago

[dead]

adityaathalye 1 month ago

Exactly "The Geordi LaForge Paradox" of "AI" systems. The most sophisticated work requires the most sophisticated user, who can only become sophisticated the usual way --- long hard work, trial and error, full-contact kumite with reality, and a degree of devotion to the field.

NooneAtAll3 1 month ago

https://www.erdosproblems.com/forum/thread/728#post-2808

> There seems to be some confusion on this so let me clear this up. No, after the model gave its original response, I then proceeded to ask it if it could solve the problem with C=k/logN arbitrarily large. It then identified for itself what both I and Tao noticed about it throwing away k!, and subsequently repaired its proof. I did not need to provide that observation.

so it was literally "yo, your proof is weak!" - "naah, watch this! [proceeds to give full proof all on its own]"

I'd say that counts