Erdos 281 solved with ChatGPT 5.2 Pro

22 days ago (twitter.com)

309 comments

> no prior solutions found.

This is no longer true, a prior solution has just been found[1], so the LLM proof has been moved to the Section 2 of Terence Tao's wiki[2].

[1] - https://www.erdosproblems.com/forum/thread/281#post-3325

[2] - https://github.com/teorth/erdosproblems/wiki/AI-contribution...

nl 22 days ago
Interesting that in Terrance Tao's words: "though the new proof is still rather different from the literature proof)"
And even odder that the proof was by Erdos himself and yet he listed it as an open problem!
- pfdietz 19 days ago
  
  The theorem is implied by an older result of Erdos, but is not a result of Erdos. Apparently this is because the connection is something called "Roger's Theorem" that was quite obscure.
  https://terrytao.wordpress.com/2026/01/19/rogers-theorem-on-...
  "This theorem is somewhat obscure: its only appearance in print is in pages 242-244 of this 1966 text of Halberstam and Roth, where the authors write in a footnote that the result is “unpublished; communicated to the authors by Professor Rogers”. I have only been able to find it cited in three places in the literature: in this 1996 paper of Lewis, in this 2007 paper of Filaseta, Ford, Konyagin, Pomerance, and Yu (where they credit Tenenbaum for bringing the reference to their attention), and is also briefly mentioned in this 2008 paper of Ford. As far as I can tell, the result is not available online, which could explain why it is rarely cited (and also not known to AI tools). This became relevant recently with regards to Erdös problem 281, posed by Erdös and Graham in 1980, which was solved recently by Neel Somani through an AI query by an elegant ergodic theory argument. However, shortly after this solution was located, it was discovered by KoishiChan that Rogers’ theorem reduced this problem immediately to a very old result of Davenport and Erdös from 1936. Apparently, Rogers’ theorem was so obscure that even Erdös was unaware of it when posing the problem!"
- TZubiri 22 days ago
  
  Maybe it was in the training set.
  
  52 replies →
davidhs 21 days ago
It looks like these models work pretty well as natural language search engines and at connecting together dots of disparate things humans haven't done.
- pfdietz 21 days ago
  
  They're finding them very effective at literature search, and at autoformalization of human-written proofs.
  Pretty soon, this is going to mean the entire historical math literature will be formalized (or, in some cases, found to be in error). Consider the implications of that for training theorem provers.
  
  2 replies →
- p-e-w 21 days ago
  
  Every time this topic comes up people compare the LLM to a search engine of some kind.
  But as far as we know, the proof it wrote is original. Tao himself noted that it’s very different from the other proof (which was only found now).
  That’s so far removed from a “search engine” that the term is essentially nonsense in this context.
  
  2 replies →
cubefox 22 days ago
This illustrates how unimportant this problem is. A prior solution did exist, but apparently nobody knew because people didn't really care about it. If progress can be had by simply searching for old solutions in the literature, then that's good evidence the supposed progress is imaginary. And this is not the first time this has happened with an Erdős problem.
A lot of pure mathematics seems to consist in solving neat logic puzzles without any intrinsic importance. Recreational puzzles for very intelligent people. Or LLMs.
- glemion43 21 days ago
  
  It shows that a 'llm' can now work on issues like this today and tomorrow it can do even more.
  Don't be so ignorant. A few years ago NO ONE could have come up with something so generic as an LLM which will help you to solve this kind of problems and also create text adventures and java code.
  
  7 replies →
- jojobas 21 days ago
  
  It's hard to predict which maths result from 100 years ago surfaces in say quantum mechanics or cryptography.
  
  1 reply →
- antonvs 21 days ago
  
  > "intrinsic importance"
  "Intrinsic" in contexts like this is a word for people who are projecting what they consider important onto the world. You can't define it in any meaningful way that's not entirely subjective.
  
  2 replies →
- MattGaiser 22 days ago
  
  There is still enormous value in cleaning up the long tail of somewhat important stuff. One of the great benefits of Claude Code to me is that smaller issues no longer rot in backlogs, but can be at least attempted immediately.
  
  25 replies →
threethirtytwo 22 days ago
This is a relief, honestly. A prior solution exists now, which means the model didn’t solve anything at all. It just regurgitated it from the internet, which we can retroactively assume contained the solution in spirit, if not in any searchable or known form. Mystery resolved.
This aligns nicely with the rest of the canon. LLMs are just stochastic parrots. Fancy autocomplete. A glorified Google search with worse footnotes. Any time they appear to do something novel, the correct explanation is that someone, somewhere, already did it, and the model merely vibes in that general direction. The fact that no human knew about it at the time is a coincidence best ignored.
The same logic applies to code. “Vibe coding” isn’t real programming. Real programming involves intuition, battle scars, and a sixth sense for bugs that can’t be articulated but somehow always validates whatever I already believe. When an LLM produces correct code, that’s not engineering, it’s cosplay. It didn’t understand the problem, because understanding is defined as something only humans possess, especially after the fact.
Naturally, only senior developers truly code. Juniors shuffle syntax. Seniors channel wisdom. Architecture decisions emerge from lived experience, not from reading millions of examples and compressing patterns into a model. If an LLM produces the same decisions, it’s obviously cargo-culting seniority without having earned the right to say “this feels wrong” in a code review.
Any success is easy to dismiss. Data leakage. Prompt hacking. Cherry-picking. Hidden humans in the loop. And if none of those apply, then it “won’t work on a real codebase,” where “real” is defined as the one place the model hasn’t touched yet. This definition will be updated as needed.
Hallucinations still settle everything. One wrong answer means the whole system is fundamentally broken. Human mistakes, meanwhile, are just learning moments, context switches, or coffee shortages. This is not a double standard. It’s experience.
Jobs are obviously safe too. Software engineering is mostly communication, domain expertise, and navigating ambiguity. If the model starts doing those things, that still doesn’t count, because it doesn’t sit in meetings, complain about product managers, or feel existential dread during sprint planning.
So yes, the Erdos situation is resolved. Nothing new happened. No reasoning occurred. Progress remains hype. The trendline is imaginary. And any discomfort you feel is probably just social media, not the ground shifting under your feet.
- eru 22 days ago
  
  > This is a relief, honestly. A prior solution exists now, which means the model didn’t solve anything at all. It just regurgitated it from the internet, which we can retroactively assume contained the solution in spirit, if not in any searchable or known form. Mystery resolved.
  Vs
  > Interesting that in Terrance Tao's words: "though the new proof is still rather different from the literature proof)"
- catoc 22 days ago
  
  I firmly believe @threethirtytwo’s reply was not produced by an LLM
  
  2 replies →
- johnfn 22 days ago
  
  I suspect this is AI generated, but it’s quite high quality, and doesn’t have any of the telltale signs that most AI generated content does. How did you generate this? It’s great.
  
  26 replies →
- magnio 22 days ago
  
  Pity that HN's ability to detect sarcasm is as robust as that of a sentiment analysis model using keyword-matching.
  
  12 replies →
- nurettin 22 days ago
  
  Why not plan for a future where a lot of non-trivial tasks are automated instead of living on the edge with all this anxiety?
  
  6 replies →
- rixed 22 days ago
  
  Are you expecting people who can't detect self-dellusions to be able to detect sarcasm, or are you just being cruel?

doctoboggan 22 days ago

Can anyone give a little more color on the nature of Erdos problems? Are these problems that many mathematicians have spend years tackling with no result? Or do some of the problems evade scrutiny and go un-attempted for most of the time?

EDIT: After reading a link someone else posted to Terrance Tao's wiki page, he has a paragraph that somewhat answers this question:

> Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools. Unfortunately, it is hard to tell in advance which category a given problem falls into, short of an expert literature review. (However, if an Erdős problem is only stated once in the literature, and there is scant record of any followup work on the problem, this suggests that the problem may be of the second category.)

from here: https://github.com/teorth/erdosproblems/wiki/AI-contribution...

QuesnayJr 22 days ago

Erdos was an incredibly prolific mathematician, and one of his quirks is that he liked to collect open problems and state new open problems as a challenge to the field. Many of the problems he attached bounties to, from $5 to $10,000.
The problems are a pretty good metric for AI, because the easiest ones at least meet the bar of "a top mathematician didn't know how to solve this off the top of his head" and the hardest ones are major open problems. As AI progresses, we will see it slowly climb the difficulty ladder.
heliumtera 21 days ago
Don't feel bad for being out of the loop. The author and Tao did not care enough about erdos problem to realize the proof was published by erdos himself. So you never cared enough and neither did they. But they care about about screaming LLMs breakthrough on fediverse and twitter.
- wasabi991011 21 days ago
  
  > Did not care enough about erdos...
  This is bad faith. Erdos was an incredibly prolific mathematician, it is unreasonable to expect anyone to have memorized his entire output. Yet, Tao knows enough about Erdos to know which mathematical techniques he regularly used in his proofs.
  From the forum thread about Erdos problem 281:
  > I think neither the Birkhoff ergodic theorem nor the Hardy-Littlewood maximal inequality, some version of either was the key ingredient to unlock the problem, were in the regular toolkit of Erdos and Graham (I'm sure they were aware of these tools, but would not instinctively reach for them for this sort of problem). On the other hand, the aggregate machinery of covering congruences looks relevant (even though ultimately it turns out not to be), and was very much in the toolbox of these mathematicians, so they could have been misled into thinking this problem was more difficult than it actually was due to a mismatch of tools.
  > I would assess this problem as safely within reach of a competent combinatorial ergodic theorist, though with some thought required to figure out exactly how to transfer the problem to an ergodic theory setting. But it seems the people who looked at this problem were primarily expert in probabilistic combinatorics and covering congruences, which turn out to not quite be the right qualifications to attack this problem.
  
  4 replies →
- nddkdkfk 21 days ago
  
  This Tao dude, does he get invited to a lot of AI conferences (accommodation included)?
  
  4 replies →

pessimist 22 days ago

From Terry Tao's comments in the thread:

"Very nice! ... actually the thing that impresses me more than the proof method is the avoidance of errors, such as making mistakes with interchanges of limits or quantifiers (which is the main pitfall to avoid here). Previous generations of LLMs would almost certainly have fumbled these delicate issues.

...

I am going ahead and placing this result on the wiki as a Section 1 result (perhaps the most unambiguous instance of such, to date)"

The pace of change in math is going to be something to watch closely. Many minor theorems will fall. Next major milestone: Can LLMs generate useful abstractions?

radioactivist 22 days ago

Seems like the someone dug something up from the literature on this problem (see top comment on the erdosproblems.com thread)
"On following the references, it seems that the result in fact follows (after applying Rogers' theorem) from a 1936 paper of Davenport and Erdos (!), which proves the second result you mention. ... In the meantime, I am moving this problem to Section 2 on the wiki (though the new proof is still rather different from the literature proof)."

dust42 22 days ago

Personally, I'd prefer if the AI models would start with a proof of their own statements. Time and again, SOTA frontier models told me: "Now you have 100% correct code ready for production in enterprise quality." Then I run it and it crashes. Or maybe the AI is just being tongue-in-cheek?

Point in case: I just wanted to give z.ai a try and buy some credits. I used Firefox with uBlock and the payment didn't go through. I tried again with Chrome and no adblock, but now there is an error: "Payment Failed: p.confirmCardPayment is not a function." The irony is, that this is certainly vibe-coded with z.ai which tries to sell me how good they are but then not being able to conclude the sale.

And we will get lots more of this in the future. LLMs are a fantastic new technology, but even more fantastically over-hyped.

becquerel 22 days ago
You get AIs to prove their code is correct in precisely the same ways you get humans to prove their code is correct. You make them demonstrate it through tests or evidence (screenshots, logs of successful runs).
- judahmeek 21 days ago
  
  Yes! Also, make sure to check those results yourself, dear reader, rather than ask the agent to summarize the results for you! ^^;
killerstorm 21 days ago

We should differentiate AI models from AI apps.
Models just generate text. Apps are supposed to make that text useful.
An app can run various kinds of verification. But would you pay an extra for that?
Nobody can make a text generator to output text which is 100% correct. That's just not a thing people can do now.

carbocation 22 days ago

The erdosproblems thread itself contains comments from Terence Tao: https://www.erdosproblems.com/forum/thread/281

redbluered 22 days ago

Has anyone verified this?

I've "solved" many math problems with LLMs, with LLMs giving full confidence in subtly or significantly incorrect solutions.

I'm very curious here. The Open AI memory orders and claims about capacity limits restricting access to better models are interesting too.

bpodgursky 22 days ago
Terence Tao gave it the thumbs up. I don't think you're going to do better than that.
- bparsons 22 days ago
  
  It's already been walked back.
  
  1 reply →

sequin 22 days ago

FWIW, I just gave Deepseek the same prompt and it solved it too (much faster than the 41m of ChatGPT). I then gave both proofs to Opus and it confirmed their equivalence.

The answer is yes. Assume, for the sake of contradiction, that there exists an $\epsilon > 0$ such that for every $k$, there exists a choice of congruence classes $a_1^{(k)}, \dots, a_k^{(k)}$ for which the set of integers not covered by the first $k$ congruences has density at least $\epsilon$.

For each $k$, let $F_k$ be the set of all infinite sequences of residues $(a_i)_{i=1}^\infty$ such that the uncovered set from the first $k$ congruences has density at least $\epsilon$. Each $F_k$ is nonempty (by assumption) and closed in the product topology (since it depends only on the first $k$ coordinates). Moreover, $F_{k+1} \subseteq F_k$ because adding a congruence can only reduce the uncovered set. By the compactness of the product of finite sets, $\bigcap_{k \ge 1} F_k$ is nonempty.

Choose an infinite sequence $(a_i) \in \bigcap_{k \ge 1} F_k$. For this sequence, let $U_k$ be the set of integers not covered by the first $k$ congruences, and let $d_k$ be the density of $U_k$. Then $d_k \ge \epsilon$ for all $k$. Since $U_{k+1} \subseteq U_k$, the sets $U_k$ are decreasing and periodic, and their intersection $U = \bigcap_{k \ge 1} U_k$ has density $d = \lim_{k \to \infty} d_k \ge \epsilon$. However, by hypothesis, for any choice of residues, the uncovered set has density $0$, a contradiction.

Therefore, for every $\epsilon > 0$, there exists a $k$ such that for every choice of congruence classes $a_i$, the density of integers not covered by the first $k$ congruences is less than $\epsilon$.

\boxed{\text{Yes}}

CGamesPlay 22 days ago
> I then gave both proofs to Opus and it confirmed their equivalence.
You could have just rubber-stamped it yourself, for all the mathematical rigor it holds. The devil is in the details, and the smallest problem unravels the whole proof.
- yosefk 22 days ago
  
  How dare you question the rigor of the venerable LLM peer review process! These are some of the most esteemed LLMs we are talking about here.
  
  1 reply →
Davidzheng 22 days ago
"Since $U_{k+1} \subseteq U_k$, the sets $U_k$ are decreasing and periodic, and their intersection $U = \bigcap_{k \ge 1} U_k$ has density $d = \lim_{k \to \infty} d_k \ge \epsilon$."
Is this enough? Let $U_k$ be the set of integers such that their remainder mod 6^n is greater or equal to 2^n for all 1<n<k. Density of each $U_k$ is more than 1/2 I think but not the intersection (empty) right?
- Paracompact 21 days ago
  
  Indeed. Your sets are decreasing periodic of density always greater than the product from k=1 to infinity of (1-(1/3)^k), which is about 0.56, yet their intersection is null.
  This would all be a fairly trivial exercise in diagonalization if such a lemma as implied by Deepseek existed.
  (Edit: The bounding I suggested may not be precise at each level, but it is asymptotically the limit of the sequence of densities, so up to some epsilon it demonstrates the desired counterexample.)
Klover 22 days ago

Here's kimi-k2-thinking with the reasoning block included: https://www.kimi.com/share/19bcfe2e-d9a2-81fe-8000-00002163c...
nsoonhui 22 days ago
I am not familiar with the field, but any chance that the deepseek is just memorizing the existing solution? Or different.
https://news.ycombinator.com/item?id=46664976
- utopiah 22 days ago
  
  Sure but if so wouldn't ChatGPT 5.2 Pro also "just memorizing the existing solution?"?
  
  3 replies →
logicchains 22 days ago

Opus isn't a good choice for anything math-related; it's worse at math than the latest ChatGPT and Gemini Pro.
amluto 22 days ago

I find it interesting that, as someone utterly unfamiliar with ergodic theory, Dini’s theorem, etc, I find Deepseek’s proof somewhat comprehensible, whereas I do not find GPT-5.2’s proof comprehensible at all. I suspect that I’d need to delve into the terminology in the GPT proof if I tried to verify Deepseek’s, so maybe GPT’s is being more straightforward about the underlying theory it relies on?

Eufrat 21 days ago

There was a post about Erdős 728 being solved with Harmonic’s Aristotle a little over a week ago [1] and that seemed like a good example of using state-of-the-art AI tech to help increase velocity in this space.

I’m not sure what this proves. I dumped a question into ChatGPT 5.2 and it produced a correct response after almost an hour [2]?

Okay? Is it repeatable? Why did it come up with this solution? How did it come up with the connections in its reasoning? I get that it looks correct and Tao’s approval definitely lends credibility that it is a valid solution, but what exactly is it that we’ve established here? That the corpus that ChatGPT 5.2 was trained on is better tuned for pure math?

I’m just confused what one is supposed to take away from this.

[1] https://chatgpt.com/share/696ac45b-70d8-8003-9ca4-320151e081...

Coeur 21 days ago

Also #124 was proved using AI 49 days ago: https://news.ycombinator.com/item?id=46094037
vessenes 21 days ago

Thanks for the curious question. This is one in a sequence of efforts to use LLMs to generate candidate proofs to open mathematical questions, which then are generally formalized into Lean, a formal proof system for pure mathematics.
Erdos was prolific and many of his open problems are numbered and have space to discuss them online, so it’s become fairly common to run through them with frontier models and see if a good proof can be come up with; there have been some notable successes here this year.
Tao seems to engage in sort of a two step approach with these proofs - first, are they correct? Lean formalization makes that unambiguous, but not all proofs are easily formulated into Lean, so he also just, you know, checks them. Second, literature search inside LLMs and out for prior results — this is to check where frontier models are at in the ‘novel proofs or just regurgitated proofs’ space.
To my knowledge, we’re currently at the point where we are seeing some novel proofs offered, but I don’t think we’ve seen any that have absolutely no priors in literature.
As you might guess this is itself sort of a Rorschach test for what AI could and will be.
In this case, it looked at first like this was a totally novel solution to something that hadn’t been solved before. On deeper search, Tao noted it’s almost trivial to prove with stuff Erdos knew, and also had been proved independently; this proof doesn’t use the prior proof mechanism though.

energy123 22 days ago

A surprising % of these LLM proofs are coming from amateurs.

One wonders if some professional mathematicians are instead choosing to publish LLM proofs without attribution for career purposes.

kristopolous 22 days ago
It's probably from the perennial observation
"This LLM is kinda dumb in the thing I'm an expert in"
- fatherwavelet 21 days ago
  
  This is just not true at this point but believe whatever you want to believe.
  
  2 replies →
- Workaccount2 21 days ago
  
  Perennial doesn't make sense in the context of something that has been around for a few months. Observations from the spring 2025 crop of LLMs are already irrelevant.
- vessenes 21 days ago
  
  … “but I guess it was able to formalize it in Lean, so…”
Workaccount2 21 days ago

>One wonders if some professional mathematicians are instead choosing to publish LLM proofs without attribution for career purposes.
This will just become the norm as these models improve, if it isn't largely already the case.
It's like sports where everyone is trying to use steroids, because the only way to keep up is to use steroids. Except there aren't any AI-detectors and it's not breaking any rules (except perhaps some kind of self moral code) to use AI.
mlpoknbji 21 days ago
I think a more realistic answer is that professional mathematicians have tried to get LLMs to solve their problems and the LLMs have not been able to make any progress.
- Davidzheng 21 days ago
  
  I think it's a bit early to tell whether GPT 5.2 has helped research mathematicians substantially given its recency. The models move so fast that even if all previous models were completely useless I wouldn't be sure this one would be. Let's wait a year and see? (it takes time to write papers)
  
  1 reply →
Davidzheng 22 days ago

I'm actually not sure what the right attribution method would be. I'd lean towards single line on acknowledgements? Because you can use it for example @ every lemma during brainstorming but it's unclear the right convention is to thank it at every lemma...
Anecdotally, I, as a math postdoc, think that GPT 5.2 is much stronger qualitatively than anything else I've used. Its rate of hallucinations is low enough that I don't feel like the default assumption of any solution is that it is trying to hide a mistake somewhere. Compared with Gemini 3 whose failure mode when it can't solve something is always to pretend it has a solution by "lying"/ omitting steps/making up theorems etc... GPT 5.2 usually fails gracefully and when it makes a mistake it more often than not can admit it when pointed out.

ashleyn 22 days ago

I guess the first question I have is if these problems solved by LLMs are just low-hanging fruit that human researchers either didn't get around to or show much interest in - or if there's some actual beef here to the idea that LLMs can independently conduct original research and solve hard problems.

utopiah 22 days ago

That's the first warning from the wiki : <<Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools.>> https://github.com/teorth/erdosproblems/wiki/AI-contribution...
dyauspitr 22 days ago
There is still value on letting these LLMs loose on the periphery and knocking out all the low hanging fruit humanity hasn’t had the time to get around to. Also, I don’t know this, but if it is a problem on Erdos I presume people have tried to solve it atleast a little bit before it makes it to the list.
- utopiah 22 days ago
  
  Is there though? If they are "solved" (as in the tickbox mark them as such, through a validation process, e.g. another model confirming, formal proof passing, etc) but there is no human actually learning from them, what's the benefit? Completing a list?
  I believe the ones that are NOT studied are precisely because they are seen as uninteresting. Even if they were to be solved in an interesting way, if nobody sees the proof because they are just too many and they are again not considered valuable then I don't see what is gained.
  
  2 replies →

a_tartaruga 22 days ago

Out of curiosity why has the LLM math solving community been focused on the Erdos problems over other open problems? Are they of a certain nature where we would expect LLMs to be especially good at solving them?

krackers 22 days ago
I guess they are at a difficulty where it's not too hard (unlike millennium prize problems), is fairly tightly scoped (unlike open ended research), and has some gravitas (so it's not some obscure theorem that's only unproven because of it's lack of noteworthiness).
- Davidzheng 22 days ago
  
  I actually don't think the reason is that they are easier than other open math problems. I think it's more that they are "elementary" in the sense that the problems usually don't require a huge amount of domain knowledge to state.
  
  3 replies →
becquerel 22 days ago

People like checking items off of lists.

wewxjfq 22 days ago

The LLMs that take 10 attempts to un-zero-width a <div>, telling me that every single change totally fixed the problem, are cracking the hardest math problems again.

int_19h 20 days ago

Math makes sense, CSS doesn't.

niemandhier 22 days ago

Is there explainability research for this type of model application? E.g. a sparse auto encoder or something similar but more modern.

I would love to know which concepts are active in the deeper layers of the model while generating the solution.

Is there a concept of “epsilon” or “delta”?

What are their projections on each other?

renewiltord 22 days ago

It’s funny. in some kind of twisted variant of Cunningham’s Law we have:

> the best way to find a previous proof of a seemingly open problem on the internet is not to ask for it; it's to post a new proof

zkmon 21 days ago

I wonder if they tried Gemini. I think Gemini could have done better, as seen from my experiences with GPT and Gemini models on some simple geometry problems.

charmpic 21 days ago

I'm looking forward to chatgpt 5.3pro. I also use chatgpt 5.2pro for various program consultations. It's been very helpful.

vercaemert 21 days ago

I was hoping there'd be more discussion about the model itself. I find the last couple of generations of Pro models fascinating.
Personally, I've been applying them to hard OCR problems. Many varied languages concurrently, wildly varying page structure, and poor scan quality; my dataset has all of these things. The models take 30 minutes a page, but the accuracy is basically 100% (it'll still striggle with perfectly-placed bits of mold). The next best model (Google's flagship) rests closer to 80%.
I'll be VERY intrigued to see what the next 2, 5, 10 years does to the price of this level of model.
energy123 21 days ago

We're eventually going to get it at cerebras inference latency. It's going to be wild.

heliumtera 21 days ago

>no prior solutions found.

They never brothered to check erdos solution already published 90 years ago. I am still confused about why erdos, who proposed the problem and the solution would consider this an unsolved problems, but this group of researchers would claim "ohh my god look at this breakthrough"

IAmGraydon 22 days ago

This is showing as unresolved here, so I'm assuming something was retracted.

https://mehmetmars7.github.io/Erdosproblems-llm-hunter/probl...

nl 22 days ago

I think that just hasn't been updated.

mikert89 22 days ago

I have 15 years of software engineering experience across some top companies. I truly believe that ai will far surpass human beings at coding, and more broadly logic work. We are very close

anonzzzies 22 days ago
HN will be the last place to admit it; people here seem to be holding out with the vague 'I tried it and it came up with crap'. While many of us are shipping software without touching (much) code anymore. I have written code for over 40 years and this is nothing like no-code or whatever 'replacing programmers' before, this is clearly different judging from the people who cannot code with a gun to their heads but still are shipping apps: it does not really matter if anyone believes me or not. I am making more money than ever with fewer people than ever delivering more than ever.
We are very close.
(by the way; I like writing code and I still do for fun)
- utopiah 22 days ago
  
  Both can be correct : you might be making a lot of money using the latest tools while others who work on very different problems have tried the same tools and it's just not good enough for them.
  The ability to make money proves you found a good market, it doesn't prove that the new tools are useful to others.
  
  2 replies →
- fc417fc802 22 days ago
  
  > holding out with the vague 'I tried it and it came up with crap'
  Isn't that a perfectly reasonable metric? The topic has been dominated by hype for at least the past 5 if not 10 years. So when you encounter the latest in a long line of "the future is here the sky is falling" claims, where every past claim to date has been wrong, it's natural to try for yourself, observe a poor result, and report back "nope, just more BS as usual".
  If the hyped future does ever arrive then anyone trying for themselves will get a workable result. It will be trivially easy to demonstrate that naysayers are full of shit. That does not currently appear to be the case.
  
  9 replies →
sekai 21 days ago
> I have 15 years of software engineering experience across some top companies. I truly believe that ai will far surpass human beings at coding, and more broadly logic work. We are very close
Coding was never the hard part of software development.
- pelorat 21 days ago
  
  Getting the architecture mostly right, so it's easy to maintain and modify in the future is IMO hard part, but I find that this is where AI shines. I have 20 years of SWE experience (professional) and (10 hobby) and most of my AI use is for architecture and scaffolding first, code second.
523-asf1 22 days ago

Gotta make sure that the investors read this message in an Erdos thread.
daxfohl 22 days ago
They already do. What they suck at is common sense. Unfortunately good software requires both.
- anonzzzies 22 days ago
  
  [flagged]
  
  1 reply →
- marktl 22 days ago
  
  Or is it fortunate (for a short period at least).
AtlasBarfed 22 days ago

Is this comment written by AI?
user3939382 22 days ago
They can only code to specification which is where even teams of humans get lost. Without much smarter architecture for AI (LLMs as is are a joke) that needle isn’t going to move.
- danielbln 21 days ago
  
  Real HN comment right here. "LLMs are a joke" - maybe don't drink the anti-hype kool aid, you'll blind yourself to the capability space that's out there, even if it's not AGI or whatever.
  
  3 replies →

syngrog66 21 days ago

I can post a long list of simple things a human can do accurately and efficiently that I've seen Gemini unable to do, repeatedly.

thunky 21 days ago

And someone could post an even longer list of things you can't do well. But what would be the point?
The LLM did better on this problem than 100% of the haters in this thread could do, and who probably can't even begin "understand" the problem.

logicallee 22 days ago

how did they do it? Was a human using the chat interface? Did they just type out the problem and immediately on the first reply received a complete solution (one-shot) or what was the human's role? What was ChatGPT's thinking time?

phelm 22 days ago
Heres the chat https://chatgpt.com/share/696ac45b-70d8-8003-9ca4-320151e081...
- logicallee 22 days ago
  
  very interesting. ChatGPT reasoned for 41 minutes about it! Also, this was one-shot - i.e. ChatGPT produced its complete proof with a single prompt and no more replies by the human, (rather than a chat where the human further guided it.)

ironbound 21 days ago

Sounds like Lean 4/rocq did all the work here

wasabi991011 21 days ago

Why do you say that? I see no mention of lean/rocq on the twitter thread, nor on the erdos problem forum thread, nor on the chatGPT conversation.

supermatt 21 days ago

What does "solved with" mean? The author claims "I've solved", so did the author solve it or GPT?

klohto 21 days ago
When you use a calculator, did you really solve it or was it the calculator?
- supermatt 21 days ago
  
  With a calculator I supply the arithmetic. It just executes it with no reasoning so im the solver. I can do the same with an LLM and still be the solver as long as it just follows my direction. Or I can give it a problem and let it reason and generate the arithmetic itself, in which case the LLM is effectively the solver. Thats why saying "I've solved X using only GPT" is ambiguous.
  But thanks for the downvote in addition to your useless comment.

dernett 22 days ago

This is crazy. It's clear that these models don't have human intelligence, but it's undeniable at this point that they have _some_ form of intelligence.

brendyn 22 days ago
If LLMs weren't created by us but where something discovered in another species' behaviour it would be 100% labelled intelligence
- te0006 21 days ago
  
  Yes, same for the case where the technology would have been found embodied in machinery aboard a crashed UFO.
qudat 22 days ago
My take is that a huge part of human intelligence is pattern matching. We just didn’t understand how much multidimensional geometry influenced our matches
- keeda 22 days ago
  
  Yes, it could be that intelligence is essentially a sophisticated form of recursive, brute force pattern matching.
  I'm beginning to think the Bitter Lesson applies to organic intelligence as well, because basic pattern matching can be implemented relatively simply using very basic mathematical operations like multiply and accumulate, and so it can scale with massive parallelization of relatively simple building blocks.
  
  2 replies →
- sdwr 22 days ago
  
  I don't think it's accurate to describe LLMs as pattern matching. Prediction is the mechanism they use to ingest and output information, and they end up with a (relatively) deep model of the world under the hood.
  
  5 replies →
- csomar 22 days ago
  
  Intelligence is hallucination that happens to produce useful results in the real world.
threethirtytwo 22 days ago

I don't think they will ever have human intelligence. It will always be an alien intelligence.
But I think the trend line unmistakably points to a future where it can be MORE intelligent than a human in exactly the colloquial way we define "more intelligent"
The fact that one of the greatest mathematicians alive has a page and is seriously bench marking this shows how likely he believes this can happen.
eru 22 days ago
Well, Alpha Go and Stockfish can beat you at their games. Why shouldn't these models beat us at math proofs?
- _fizz_buzz_ 21 days ago
  
  Chess and Go have very restrictive rules. It seems a lot more obvious to me why a computer can beat a human at it. They have a huge advantage just by being able to calculate very deep lines in a very short time. I actually find it impressive for how long humans were able to beat computers at go. Math proofs seem a lot more open ended to me.
- thfuran 22 days ago
  
  Alpha go and stockfish were specifically designed and trained to win at those games.
  
  1 reply →
ekianjo 22 days ago
It's pattern matching. Which is actually what we measure in IQ tests, just saying.
- jadenpeterson 22 days ago
  
  There's some nuance. IQ tests measure pattern matching and, in an underlying way, other facets of intelligence - memory, for example. How well can an LLM 'remember' a thing? Sometimes Claude will perform compaction when its context window reaches 200k "tokens" then it seems a little colder to me, but maybe that's just my imagination. I'm kind of a "power user".
- rurban 22 days ago
  
  I call it matching. Pattern matching had a different meaning.
  
  2 replies →
altmanaltman 22 days ago

Depends on what you mean by intelligence, human intelligence and human
TZubiri 22 days ago

As someone who doesn't understand this shit, and how it's always the experts who fiddle the LLMs to get good outputs, it feels natural to attribute the intelligence to the operator (or the training set), rather than the LLM itself.
xyzsparetimexyz 21 days ago

Yes it is intelligent, but so what? Its not conscious, sentient or sapient. It's a pattern matching chinese room.

magicalist 22 days ago

Funny seeing silicon valley bros commenting "you're on fire!" to Neel when it appears he copied and pasted the problem verbatim into chatGPT and it did literally all the other work here

https://chatgpt.com/share/696ac45b-70d8-8003-9ca4-320151e081...

inimino 21 days ago

Knowing which problem to copy and paste into the model is also a skill.

YouAreWRONGtoo 21 days ago

[dead]

ath3nd 21 days ago

[dead]

jrflowers 22 days ago

Narrator: The solution had already appeared several times in the training data

ares623 22 days ago

This must be what it feels like to be a CEO and someone tells me they solved coding.

beders 22 days ago

Has anyone confirmed the solution is not in the training data? Otherwise it is just a bit information retrieval LLM style. No intelligence necessary.

ath3nd 21 days ago

[dead]