Further human + AI + proof assistant work on Knuth's "Claude Cycles" problem

19 hours ago (twitter.com)

Knuth Claude's Cycles note update: problem now fully solved, by LLMs - https://news.ycombinator.com/item?id=47230710 - March 2026 (362 comments)

171 comments

mean_mistreater

vatsachak 16 hours ago

I've always said this but AI will win a fields medal before being able to manage a McDonald's.

Math seems difficult to us because it's like using a hammer (the brain) to twist in a screw (math).

LLMs are discovering a lot of new math because they are great at low depth high breadth situations.

I predict that in the future people will ditch LLMs in favor of AlphaGo style RL done on Lean syntax trees. These should be able to think on much larger timescales.

Any professional mathematician will tell you that their arsenal is ~ 10 tricks. If we can codify those tricks as latent vectors it's GG

pfdietz 3 minutes ago

> Any professional mathematician will tell you that their arsenal is ~ 10 tricks. If we can codify those tricks as latent vectors it's GG
And if we can train the systems to discover new tricks, whoa Nelly.
vatsachak 16 hours ago
Tricks are nothing but patterns in the logical formulae we reduce.
Ergo these are latent vectors in our brain. We use analogies like geometry in order to use Algebraic Geometry to solve problems in Number Theory.
An AI trained on Lean Syntax trees might develop it's own weird versions of intuition that might actually properly contain ours.
If this sounds far fetched, look at Chess. I wonder if anyone has dug into StockFish using mechanistic interpretability
- myffical 15 hours ago
  
  Some DeepMind researchers used mechanistic interpretability techniques to find concepts in AlphaZero and teach them to human chess Grandmasters: https://www.pnas.org/doi/10.1073/pnas.2406675122
- hodgehog11 14 hours ago
  
  This argument, that LLMs can develop new crazy strategies using RLVR on math problems (like what happened with Chess), turns out to be false without a serious paradigm shift. Essentially, the search space is far too large, and the model will need help to explore better, probably with human feedback.
  https://arxiv.org/abs/2504.13837
  
  9 replies →
- slopinthebag 16 hours ago
  
  Stockfish's power comes from mostly search, and the ML techniques it uses are mainly about better search, i.e. pruning branches more efficiently.
  
  9 replies →
hodgehog11 14 hours ago
As a professional mathematician, I would say that a good proof requires a very good representation of the problem, and then pulling out the tricks. The latter part is easy to get operating using LLMs, they can do it already. It's the former part that still needs humans, and I'm perfectly fine with that.
- threethirtytwo 13 hours ago
  
  But are you ok with the trendline of ai improvement? The speed of improvement indicates humans will only get further and further removed from the loop.
  I see posts like your all the time comforting themselves that humans still matter, and every-time people like you are describing a human owning an ever shrinking section of the problem space.
  
  5 replies →
madrox 15 hours ago
> I've always said this but AI will win a fields medal before being able to manage a McDonald's.
I love this and have a corollary saying: the last job to be automated will be QA.
This wave of technology has triggered more discussion about the types of knowledge work that exist than any other, and I think we will be sharper for it.
- bitwize 15 hours ago
  
  The ownership class will be sharper. They will know how to exploit capital and turn it into more capital with vastly increased efficiency. Everybody else will be hosed.
  
  4 replies →
Yoric 4 hours ago

> I predict that in the future people will ditch LLMs in favor of AlphaGo style RL done on Lean syntax trees. These should be able to think on much larger timescales.
This is certainly my hope.
In my spare time, I'm slowly, very slowly, inching towards a prototype of something that could work like that.
ryanar 15 hours ago
Are they actually producing new math? In the most recent ACM issue there was an article about testing AI against a math bench that was privately built by mathematicians, and what they found is that even though AI can solve some problems, it never truly has come up with something novel and new in mathematics, it is just good at drawing connections between existing research and putting a spin on it.
- in-silico 11 hours ago
  
  I'm not accusing you in particular, but I feel like there's a lot of circular reasoning around this point. Something like: AI can't discover "new math" -> AI discovers something -> since it was discovered by AI it must not be "new math" -> AI can't discover "new math"
  For example, there was a recent post here about GPT-5.4 (and later some other models) solving a FrontierMath open problem: https://news.ycombinator.com/item?id=47497757
  That would definitely be considered "new math" if a human did it, but since it was AI people aren't so sure.
  
  4 replies →
- hodgehog11 15 hours ago
  
  It's finding constructions and counterexamples. That's different from finding new proof techniques, but still extremely useful, and still gives way to novel findings.
3abiton 14 hours ago

It will be heavily still reliant onexpert human input and interactions. Knuth is an expert, and know how to guide.
smokel 16 hours ago
I think this is mostly about existing legislature, not about technology.
In any other context than when your paycheck depends on it, you would probably not be following orders from a random manager. If your paycheck depended on following the instructions of an AI robot, the world might start to look pretty scary real soon.
- jfim 10 hours ago
  
  > If your paycheck depended on following the instructions of an AI robot, the world might start to look pretty scary real soon.
  That's already the case, minus AI, for gig workers. Their only agency is to accept or decline a ride/delivery, the rest is follow instructions.
- vatsachak 16 hours ago
  
  There's a lot to being a manager
  - Coherent customer interaction
  - Common sense judgements
  - Scheduling
  - Quality control
  All which are baked into humans but not so much into LLMs
  Even if it were legal to have an LLM as a GM, I think it would fair poorly
- throw3747488 15 hours ago
  
  AI actually has to follow all rules, even the bad rules. Like when autonomous car drives super carefully.
  Imagine mcdonald management would enforce dog related rules. No more filthy muppets! If dog harasses customers, AI would call cops, and sue for restraining order! If dog defecates in middle of restaurant, everything would get desinfected, not just smeared with towels!
  Nutters would crucify AI management!
kelseyfrog 14 hours ago
As of now, no models have solved a Millennium Prize Problem[1].
1. https://mppbench.com/
- raincole 8 hours ago
  
  Most Fields medals winners haven't either, except one.
- utopcell 8 hours ago
  
  This is the real Litmus test isn't it? There will be a deafening silence from critics when AI decides P vs NP.
slopinthebag 16 hours ago
> AI will win a fields medal before being able to manage a McDonald's
Of course, because it takes multi-modal intelligence to manage a McDonalds. I.e. it requires human intelligence.
> I predict that in the future people will ditch LLMs in favor of AlphaGo style RL
Same for coding as well. LLM's might be the interface we use with other forms of AI though.
- vatsachak 16 hours ago
  
  Something like building Linux is more akin to managing a McDonald's than it is to a 10 page technical proof in Algebraic Groups.
  Programming is more multimodal than math.
  Something like performance engineering might be free lunch though
  
  17 replies →
NamlchakKhandro 16 hours ago
I've never seen you say that
- vatsachak 16 hours ago
  
  You will have to take my word that I started saying this in Dec 2024 lol

smithcoin 16 hours ago

When I was younger I remember a point of demarcation for me was learning the 4chan adage “trolls trolling trolls”, and approaching all internet interactions with skepticism. While I have been sure that Reddit for a while has succumbed to being “dead internet”. This thread is another moment for me- I can no longer recognize who is a bot, and who has honest intentions.

ansc 2 hours ago
That's not just a great insight, but one that's worth carrying with us. That's why I built RememberBuddy. It's the one place for you to store your every-day insights so you don't forget.
- podgorniy 2 hours ago
  
  What a great project idea, worth getting it clients. That's why I've build a platform for promoting young startapers to find theit niche audience. Give it a try and go to the moon!
  
  1 reply →

breatheoften 14 hours ago

Like so many things -- the evolution of AI math will I think follow trajectories hinted at in the 90s by the all time great sci-fi author Greg Egan. The nature of math won't change -- but the why of it definitely will. Egan imagined a future ai civilization in Diaspora where "math discovery" -- by nature in the future perhaps accurately described as "mechanistic math discovery" is modeled by society as a kind of salt mine environment in which you can dig for arbitrarily long amounts of time and find new nuggets. The nuggets themselves have a kind of "pure value" as mathematical objects even if they might not have any knowable value outside the mines. Some personalities were interested in and valued the nuggets for their own sake while others didn't but recognized that there were occasionally nuggets found in the mind that had broader appeal.

Research institutes like those founded by Terence Tao in our current present feel like they will align to this future almost perfectly on a long enough timeline -- tho I think on a shorter timeline this area of research is almost certain to provide a ton of useful ways to advance our current ai systems as our current systems are still in a state where literally anything that can generate new information that is "accurate" in some way -- like our current theorem prover engines are enormously valuable parts of our still manually curated training loops.

pks016 16 hours ago

Interesting but not surprising to me. Once a field expert guides the models, they most likely will reach a solution. The models are good at lazy work for experts. For hard or complicated questions, many a time the models have blind spots.

qnleigh 8 hours ago

In the paper, they give part of their system prompt:

> * After EVERY exploreXX.py run, IMMEDIATELY update this file [plan.md] before doing anything else. * No exceptions. Do not start the next exploration until the previous one is documented here.

Is this known to improve performance for advanced problem solving? If so, why this specific prompt?

Imanari 4 hours ago

Maybe to be better able to restart the process and not lose track.

manapause 3 hours ago

If you give 100 monkeys 100 guns and room full of building materials, how long will it take before they build a house?

How long will it take before they rob a bank?

If they do either of those things will the results have been intentional from the simian’s POV?

not_that_d 4 hours ago

Seems like we are ready heading to what the OpenAI CEO wanted "intelligence just available thru a subscription"

ftchd 15 hours ago

https://xcancel.com/BoWang87/status/2037648937453232504

monkpit 13 hours ago

So many of the replies are clearly AI. “That’s not X — it’s Y.”

gnarlouse 17 hours ago

out of curiosity, i wonder if people are taking stabs at p!=np

testaccount28 14 hours ago
"our new grad student made progress on the combinatorics problem we posed!"
"oh awesome let's see if he can solve p!=np!"
- ykonstant 2 hours ago
  
  Yes, too many people here do not understand the distance between the problems the article is discussing (and LLMs have solved) and the big problems in math and CS.
  
  1 reply →

bharxhav 14 hours ago

Ramanujan is a good analogy for this situation. Theories could be right/wrong, until there's a proof. Same with anything AI produces. There's always a "told you so" baked in with it's response.

johnwhitman 6 hours ago

[dead]

adrithmetiqa 18 hours ago

Super interesting but what does this mean for us mere mortals?

dataviz1000 18 hours ago
I got Claude to self reference and update its own instructions to solve making a typed proxy API of any website. After a week, scores of iterations, it can reverse engineer any website. The first few days I had to be deeply involved with each iteration loop. Domain knowledge is helpful. Each time I saw a problem I would ask Claude to update its instructions so it doesn't happen again. Then less and less. Eventually it got to the point it was updating and improving the metrics every iteration unsupervised.
Edit: This is going to have huge ramifications for the tech security industry as these systems will be able to break security systems as easily it solved the proof. The sooner the good guys, if there are any left, understand this the better it will be for everybody.
> Super interesting but what does this mean for us mere mortals?
I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes to make sure it doesn't need human help. I went to the coffeeshop and drank very good coffee listening to music. Then at night I sat and had a beer thinking about T.S. Eliot's 'The Wasteland', the effect of industrialization in England at that time and his views of how ennui affected the aristocracy.
- DrewADesign 17 hours ago
  
  > I went to the coffeeshop and drank very good coffee listening to music. Then at night I sat and had a beer thinking about T.S. Eliot's 'The Wasteland', the effect of industrialization in England at that time and his views of how ennui affected the aristocracy.
  Well, for those among us that are not aristocracy already, except for the vanishingly small number of people required to oversee such processes, we’re probably the closest we’re going to get to it. If they don’t need people to do the tech labor, we’ve got way more people than we need, so that’s a huge oversupply of tech skills, which means tech skills are rapidly becoming worthless. Glad to see how fast we’re moving in our very own race to the bottom!
  
  20 replies →
- frizlab 17 hours ago
  
  > I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes to make sure it doesn't need human help.
  That is a nightmarish scenario tbh
  
  7 replies →
- dunder_cat 16 hours ago
  
  > Edit: This is going to have huge ramifications for the tech security industry as these systems will be able to break security systems as easily it solved the proof. The sooner the good guys, if there are any left, understand this the better it will be for everybody.
  What can the good guys do? Fire up Claude to improve their systems? Unless you have it working fully autonomously to counter-act abuse, I don't see how you can beat the "bad guys". There may be some industries where this is a solved problem (e.g. you can do all the validation server-sided, religiously follow best practices to prevent and mitigate abuse), but a lot of stuff like multiplayer video games will be doomed unless they move to a "you must use a locked down system we control" model. I honestly don't consider it liberating as someone that has various hobby projects, that now in addition to plain old DDoS I'll also have people spin up layer 7 attacks with just their credit card. It almost makes me want to give up instead of pushing forward in a world where the worst of the worst has access to the best of the best.
  
  1 reply →
- ale 16 hours ago
  
  This type of slop comment is somehow worse than spam.
  >After a week, scores of iterations, it can reverse engineer any website
  Cool, let’s see the proof.
  
  4 replies →
- troupo 17 hours ago
  
  > I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes
  2-3 hours "walking" while having to check in every 5-10 minutes?
  If I have to check in every 5-10 minutes, I won't taste coffee or hear that there's good music playing.
  
  2 replies →
- colechristensen 16 hours ago
  
  I have similar amounts of success (pretty good!) standing in line at a coffee shop talking to people who work for me through some action that needs to be taken and doing the same with AI.
  However I do not trust AI anywhere near as much as I trust the humans. The AI is super capable but also occasionally a psychopath toddler. I sat in amused astonishment when faced with job 2 not running because job 1 was failing Claude went in to the database, changed the failure record to success, triggered job 2 which produced harmful garbage, and then claimed victory. Only the most troubled person would even think of doing that, but Claude thought it was the best solution.
  
  1 reply →
- virtue3 16 hours ago
  
  That's fucking insane. Thank you for sharing.
  I had a bad feeling we were basically already there.
TrainedMonkey 18 hours ago
My understanding is that, if confirmed, this demonstrates that AI can find novel solutions. This is a strong counterpoint to generative-AI-is-strictly-limited-to-training-data.
- dijksterhuis 16 hours ago
  
  https://en.wikipedia.org/wiki/AlphaFold ...
  we've had AlphaFold for a while. it's not a novel that we have ML solutions that can find, erm, novel solutions.
  however, by and large, most LLMs as typically used by most individuals aren't solving novel problems. and in those scenarios, we often end up with regurgitated/most common/lowest common denominator outputs... it's a probability distribution thing.
- in-silico 11 hours ago
  
  > generative-AI-is-strictly-limited-to-training-data
  Anyone who understands reinforcement learning already knows that's not the case.
- psychoslave 16 hours ago
  
  Put in the hands of great mathematicians, pencil and paper proved able to write proofs of open problems.
  
  1 reply →
- artninja1988 18 hours ago
  
  [dead]
- hrmtst93837 16 hours ago
  
  [dead]
muskstinks 16 hours ago

Another signal that we still have relevant progress in ai.
Also that it is now good enough to make researchers faster.
brcmthrowaway 18 hours ago
Learn plumbing
- oytis 18 hours ago
  
  There is no reason why market for plumbing will get much larger than it is now (which is not too large)
  
  4 replies →
- incognito124 17 hours ago
  
  Where I live it's bathroom and kitchen tiling
- radu_floricica 17 hours ago
  
  This is kindof the opposite? Man + AI > either man or AI. I'd say "learn to work with Claude" is the better lesson here.
  
  3 replies →
- NitpickLawyer 18 hours ago
  
  I know your reply was half joking, so please take this the same way, but ... are you sure about that? https://www.youtube.com/watch?v=p1ip68Vv7NE
  
  3 replies →
- dakolli 18 hours ago
  
  AI isn't replacing anything, get over yourself.
  
  1 reply →
heliumtera 17 hours ago

That llms in the middle of everything will continue until morale improve because llms can generate text on top of bullshit made up problems

pugchat 14 hours ago

[dead]

techpulse_x 5 hours ago

[dead]

linncharm 4 hours ago

[dead]

riteshyadav02 15 hours ago

[dead]