← Back to context

Comment by MyFirstSass

2 days ago

I'm not sure i understand the wild hype here in this thread then.

Seems exactly like the tests at my company where even frontier models are revealed to be very expensive rubber ducks, but completely fails with non experts or anything novel or math heavy.

Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.

Yes, the contributions of the people promoting the AI should be considered, as well as the people who designed the Lean libraries used in-the-loop while the AI was writing the solution. Any talk of "AGI" is, as always, ridiculous.

But speaking as a specialist in theorem proving, this result is pretty impressive! It would have likely taken me a lot longer to formalize this result even if it was in my area of specialty.

  • > Any talk of "AGI" is, as always, ridiculous.

    How did you arrive at "ridiculous"? What we're seeing here is incredible progress over what we had a year ago. Even ARC-AGI-2 is now at over 50%. Given that this sort of process is also being applied to AI development itself, it's really not clear to me that humans would be a valuable component in knowledge work for much longer.

    • It requires constant feedback, critical evaluation, and checks. This is not AGI, its cognitive augmentation. One that is collective, one that will accelerate human abilities far beyond what the academic establishment is currently capable of, but that is still fundamentally organic. I don't see a problem with this--AGI advocates treat machine intelligence like some sort of God that will smite non-believers and reward the faithful. This is what we tell children so that they won't shit their beds at night, otherwise they get a spanking. The real world is not composed of rewards and punishments.

      7 replies →

    • > it's really not clear to me that humans would be a valuable component in knowledge work for much longer.

      To me, this sounds like when we first went to the moon, and people were sure we'd be on Mars be the end of the 80's.

      > Even ARC-AGI-2 is now at over 50%.

      Any measure of "are we close to AGI" is as scientifically meaningful as "are we close to a warp drive" because all anyone has to go on at this point is pure speculation. In my opinion, we should all strive to be better scientists and think more carefully about what an observation is supposed to mean before we tout it as evidence. Despite the name, there is no evidence that ARC-AGI tests for AGI.

      2 replies →

This accurately mirrors my experience. It never - so far - has happened that the AI brought any novel insight at the level that I would see as an original idea. Presumably the case of TFA is different but the normal interaction is that that the solution to whatever you are trying to solve is a millimeter away from your understanding and the AI won't bridge that gap until you do it yourself and then it will usually prove to you that was obvious. If it was so obvious then it probably should have made the suggestion...

Recent case:

I have a bar with a number of weights supported on either end:

|---+-+-//-+-+---|

What order and/or arrangement or of removing the weights would cause the least shift in center-of-mass? There is a non-obvious trick that you can pull here to reduce the shift considerably and I was curious if the AI would spot it or not but even after lots of prompting it just circled around the obvious solutions rather than to make a leap outside of that box and come up with a solution that is better in every case.

I wonder what the cause of that kind of blindness is.

  • That problem is not clearly stated, so if you’re pasting that into an AI verbatim you won’t get the answer you’re looking for.

    My guess is: first move the weights to the middle, and only then remove them.

    However “weights” and “bar” might confuse both machines and people into thinking that this is related to weight lifting, where there’s two stops on the bar preventing the weights from being moved to the middle.

    • The problem is stated clearly enough that humans that we ask the question of will sooner or later see that there is an optimum and that that optimum relies on understanding.

      And no, the problem is not 'not clearly stated'. It is complete as it is and you are wrong about your guess.

      And if machines and people think this is related to weight lifting then they're free to ask follow up questions. But even in the weight lifting case the answer is the same.

      5 replies →

  • The problem is unclear. I think you have a labelled graph G=(V, E) with labels c:V->R, such that each node in V consists of a triple (L, R, S) where L is a sequence of weights are on the left, R is a sequence of weights that are on the right, and S is a set of weight that have been taken off. Define c(L, R, S) to be the centre of mass. Introduce an undirected edge e={(L, R, S), (L', R', S')} between (L, R, S) and (L', R', S') either if (i) (L', R', S') results from taking the first weight off L and adding it to S, or (ii) (L', R', S') results from taking the first weight off R and adding it to S, or (iii) (L', R', S') results from taking a weight from W and adding it to L, or (iv) (L', R', S') results from taking a weight from W and adding it to R.

    There is a starting node (L_0, R_0, {}) and an ending node ({}, {}, W) , with the latter having L=R={}.

    I think you're trying to find the path (L_n, R_n, S_n) from the starting node to the ending node that minimises the maximum absolute value of c(L_n, R_n, S_n).

    I won't post a solution, as requested.

In other words, LLMs work best when *you are absolutely right" and "this is a very insightful question" are actually true.

Lots of users seem to think LLM's think and reason so this sounds wonderful. A mechanical process isn't thinking, certainly it does NOT mirror human thinking. The processes being altogether different.

The proof is ai generated?

  • Eh? The text reads:

    "Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver"

    Not saying it's not an amazing setup, i just don't understand the word "AI" being used like this when it's the setup / system that's brilliant in conjunction with absolute experts.

Do you have any idea how many people here have paychecks that depend on the hype, or hope to be in that position? They were the same way for Crypto until it stopped being part of the get-rich-quick dream.

> Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.

This hits so true to home. Just today in my field a manager without expertise in a topic gave me an AI solution to something I am an expertise in. The AI was very plainly and painfully wrong, but it comes down to the user prompting really poorly. When I gave a el formulated prompt to the same topic, I got the correct answer on the first go.

"the more interesting capability revealed by these events is the ability to rapidly write and rewrite new versions of a text as needed, even if one was not the original author of the argument." From the Tao thread. The ability to quickly iterate on research is a big change because "This is sharp contrast to existing practice where....large-scale reworking of the paper often avoided due both to the work required and the large possibility of introducing new errors."