Comment by briandw

3 hours ago

Most of the arguments here feel like gate keeping and resistance to change. I didn't see any arguments that were directly about advancing the state of knowledge of math.

“Current automated techniques can produce plausible but unreliable (or even incorrect) arguments which are difficult to distinguish from correct mathematical proofs.”

That seems like a problem for mathematics with or without AI.

Isn’t this a problem with human proofs as well?

“Many current models are also built on data obtained by systematically exploiting licenses and access arrangements that were not made with artificial intelligence in mind, or indeed by simply violating copyright protections”

Copyright? The copyright arguments have been hard to make in domains where copyright is much stronger, mathematical knowledge isn’t even subject to copyright.

“Technologies which affect the way in which mathematics is practiced may disturb the current system of incentives”

Resistance to change again.

“Proper evaluation is endangered if results are communicated through informal channels”

Gatekeeping again.

There is some of that but I wouldn't call it gatekeeping. Universities lately promote citations and publications so there's a sense that results are all that matters. Results matter, yes, but there's a human side too where we're kind of asking about human creativity and ability. To me an appropriate analogy is in climbing Mt. Everest. Proving something, or even writing a thesis, is like climbing Mt. Everest. A lot of the value is actually in the effort you put into it. You could take a helicopter ride up to the top and then climb a few steps and claim "You climbed to the peak of Everest". That's like using AI. But if you asked them about what it was like, how they prepared, etc. their answer would not be helpful. So I think there is a lot of value in the journey itself and outsourcing all this to AI would destroy the human part of it.

  • It's completely antithetical to the whole enterprise to hide anything these researchers produce behind a paywall. Id be glad to see that go.

    > or even writing a thesis, is like climbing Mt. Everest. A lot of the value is actually in the effort you put into it.

    As an analogy, in the music industry, if you need a jingle written, you wouldn't care if someone spent five minutes or five years writing it. AI is now filling that formulaic space very well. It won't replace the top end of humans output but it completely outdoes all the boilerplate stuff humans take an age creating

> Isn’t this a problem with human proofs as well?

Human proofs are themselves a kind of a proof of work. They certainly write flawed proofs, but you can expect a human author of a paper to have put in more effort--substantially more--than the human reader needs to verify it. Arguably, this asymmetry disappears for generated proofs.

Automated theorem provers help a bit here, but they don't eliminate the human verification cost.

  • That doesn’t mean it isn’t a problem for human proofs as well. Mathematicians use affiliations and c.v.’s as their primary means of gatekeeping for review, not the effort that went into a proof or its superficial appearances. Especially in hard and big problems which have had any number of flawed proofs.

    Even Grigori Perelman took years to have his proof of Poicaré conjecture accepted, and he had a Ph.D. and a Berkeley fellowship.

    Expecting a human author to put substantially more effort into a proof than needed to verify it is oversimplifying. It is more a matter of credentialing and collegiality in mathematics whereby someone’s reputation and work-product demonstrates that a purported proof put forward for review is likely to be true or at least a valuable or interesting contribution even if imperfect.

    AI makes this a much bigger challenge.

You aren’t really engaging with the substance or heart of the post, and your reading feels a bit knee-jerky and bad-faith to me.

Can't all proofs be eventually broken down into their fundamental pieces and then it's clear as day if it's right or wrong? Seems like a proof would be the best place to determine if an AI is right or not because the output is either right or wrong, there's no subjectivity and the, now common, excuse "well a human would have done the same" won't hold up.

  •    > Can't all proofs be eventually broken down into their fundamental pieces and then it's clear as day if it's right or wrong?
    

    You’d think so, but not really. There are mathematical structures which are unimaginably huge but have little if any reducible structure. For example, in algebra, one of the most basic structures is a Group. When trying to understand a group, one of the most important tools is to break a group into chunks using what’s called a “normal subgroup”. However it turns out that there are some absolutely enormous groups that are “simple” (ie have no normal subgroups). So, there is a set of 26 of these known as the “sporadic simple groups” that just don’t fit any kind of pattern. Proving results about these has proved very difficult because they can’t be broken down (they have no normal subgroups) and by definition just don’t fit any kind of other pattern. One of these, the “monster” group has approximately 8x10^53 members. So you have a set that is unimaginably massive and has very little internal structure as it is “simple” and so can’t be broken down further.

    The proof that there are 26 of these sporadic simple groups is part of the theorem known as the classification of finite simple groups, sometimes known as the “Enormous Theorem”.[1] It took over 100 mathematicians nearly 50 years and resulted in hundreds of papers. Even with that many mathematicians involved, there were still errors and revisions needed to the original proof. Some of the original authors are gradually publishing a somewhat simplified version of the proof but it’s still a massive effort.

    [1] https://en.wikipedia.org/wiki/Classification_of_finite_simpl...

  • Generally, yes, but once broken down you end up with a large number of items that individually each is obviously true, so you know the combined statement Is true, but you don't find out if it is saying what you think it is saying.

    In combining the parts you have the correct answer to a question, but is it that question you want to know?

    Consider a proof that in the future all people will be happy.

    You can methodically show this to be true but at the same time inadvertently include a proof that the number of people in the future will be zero.

    It doesn't make the claim wrong, it stays undoubtedly true. It's just not what you assume it means.

It is, but it is somewhat worse for machine-generated proofs, especially when the proof is very long and was done by brute force (eg the 4 colour map theorem[1] is the famous example), or depends on a lot of niche results in disparate areas (which LLMs are wont to sometimes do).

Even when the proof is produced by the llm in a formal system like Lean4 it may not be “honest”[2] and it can be hard to tell if the proof is very long and complex and especially if it includes highly specialized results from lots of different areas of maths. Llms can (and do) do this just fine, but for a human proof that would require a team each of which was specialized in a particular area. Those people are more likely to be able to cross-check each other.

[1] https://pubs.ams.org/ebooks/conm/098/ and https://en.wikipedia.org/wiki/Four_color_theorem

[2] An “honest” proof may contain bugs or errors but it does not constitute a deliberate attack on the proof system or the math libraries it uses. Systems like Lean aim to not incorrectly validate an honest proof with mistakes but don’t guarantee anything in the case of a proof being dishonest. This is the sense used here https://lean-lang.org/doc/reference/latest/ValidatingProofs/ .

Your list is cherry picked from the list of "potential threats" to the values of the mathematical research community identified by this document. They aren't criticisms or absolute statements, they're literally a list of potential new problems for the future of mathematical research, and they all seem reasonable to me, if not all at the same levels of magnitude or plausibility.

Notably you don't seem to be looking at either the list of identified values or their recommendations to researchers in their use of LLMs, which would seem much more important to engage with in any non-shallow dismissal of the document as "feel[ing] like gate keeping and resistance to change".

It's also kind of a bad look (and actively harmful for discourse) for people working on AI to be so dismissive of fields actively engaging with how their field is changing due to AI. I haven't seen any other field engaging this actively with its possible futures, have you? Usually we seem to only get some extreme of over-hyped utopia, doomerism, or dismissal of everything as slop.