Peano arithmetic is enough, because Peano arithmetic encodes computation

1 month ago (math.stackexchange.com)

This is a stack overflow question that I turned into a blog post.

It covers both the limits of what can be proven in the Peano Axioms, and how one would begin bootstrapping Lisp in the Peano Axioms. All of the bad jokes are in the second section.

Corrections and follow-up questions are welcome.

  • After putting on my boots and wading through all of that I think that you have one edit to make.

    In the "Why Lisp?" section there is a bit of basic logic defined. The first of those functions appears to have unbalanced parentheses.

    >(defun not (x) > (if x > false > true)

    I have a compulsion that I can't control when someone starts using parentheses. I have to scan to see who does what with who.

    You later say in the same section

    >But it is really easy to program a computer to look for balanced parentheses

    Okay. This is pretty funny. Thanks for the laugh. I realize that you weren't doing that but it still is funny that you pointed out being able to do it.

    This later comment in the "Basic Number Theory" section was also funny.

    >; After a while, you stop noticing that stack of closing parens.

    I really enjoyed reading this post. Great job. Though it has been a long time since I did anything with Lisp I was able to walk through and get the gist again.

    • Thank you for the correction!

      And I'm glad that someone liked some of my attempts at humor. As my wife sometimes says, "I know that you were trying to make a joke, because it wasn't funny."

      3 replies →

    • > I have to scan to see who does what with who.

      Are you saying that parentheses introduce the problem of having to scan to see what goes with what?

      As in, if we don't have parentheses, but still have recursive/nested structure, we don't have to scan?

      3 replies →

  • Thanks for this. In another strange internet coincidence, I was asking ChatGPT to break down the fundamentals of the Peano axioms just yesterday and now I see this. Thumbs up!

    • I suspect in the near future (if not already) ChatGPT data will be sold to data brokers and bough by Amazon such that writing a prompt will end up polluting Alexa product recommendations within a few minutes to hours.

      12 replies →

    • Well there was a post on mathmemes a day ago about teaching kids set theory as a foundation for math with some discussion of PA. So maybe related ideas are echoing across the intertubes in this moment?

      1 reply →

    • I've noticed this too. I will be researching a topic elsewhere and then it seems to pop up in HN. Am I just looking for patterns where there are none, or is there some trickery happening where HN tracks those activities and mixes in posts more relevant to my interests with the others?

      2 replies →

  • This is fascinating! I haven't read much past the intro yet, but I find the whole premise that you can prove all specific instances of Goodstein sequences terminate at 0 within PA, but not that all sequences terminate (it's a trivial result but still interesting).

    I also find it super weird that Peano axioms are enough to encode computation. Again, this might be trivial if you think about it, but that's one self-referential layer more than I've thought about before.

    One question for you btilly - oddly enough, I just recently decided to learn more Set Theory, and actually worked on an Intro to Set Theory textbook up to Goodstein sequences just last week. I'm a bit past that.

    Do you have a good recommendation for a second, advanced Set Theory textbook? Also, any recommendation for a textbook that digs into Peano arithmetic? (My mini goal since learning the proof for Goodstein sequences is to work up to understanding the proof that Peano isn't strong enough to prove Goodstein's theorem, though I'll happily take other paths instead if they're strongly recommended.)

  • Boot sector Lisp bootstraps itself.

    https://justine.lol/sectorlisp2/

    Also, lots of Lisp from https://t3x.org implement numerals (and the rest of stuff) from cons cells and apply/eval:

    'John McCarthy discovered an elegant self-defining way to compute the above steps, more commonly known as the metacircular evaluator. Alan Kay once described this code as the "Maxwell's equations of software". Here are those equations as implemented by SectorLISP:

    ASSOC EVAL EVCON APPLY EVLIS PAIRLIS '

    Ditto with some Forths.

    Back to T3X, he author has Zenlisp where the meta-circular evaluation it's basically how to define eval/apply and how to they ared called between themselves in a recursive way.

    http://t3x.org/zsp/index.html

    • I knew that this sort of stuff was possible, but it is fun to see it.

      When it comes to bootstrapping a programming language from nothing, the two best options are Lisp and Forth. Of the two, I find Lisp easier to understand.

  • Hey! This is fantastic and actually ties in some very high disparate parts of math. Basically, reorient & reformulate all of math/epistomology around discrete sampling the continuum. Invert our notions of Aleph/Beth/Betti numbers as some sort of triadic Grothendieck topoi that encode our human brain's sensory instruments that nucleate discrete samples of continuum of reality (ontology)

    Then every modal logic becomes some mapping of 2^(N) to some set of statements. The only thing that matters is how predictive they are with some sort of objective function/metric/measure but you can always induce an "ultra metric" around notions of cognitive complexity classes i.e. your brain is finite and can compute finite thoughts/second. Thus for all cognition models that compute some meta-logic around some objective F, we can motivate that less complex models are "better". There comes the ultra measure to tie disparate logic systems. So I can take your Peano Axioms and induce a ternary logic (True, False, Maybe) or an indefinite-definite logic (True or something else entirely). I can even induce bayesian logics by doing power sets of T/F. So a 2x2 bayesian inference logic: (True Positive, True Negative, False Positive, False Negative)

    Fun stuff!

    Edit: The technical tldr that I left out is unification all math imho: algebraic topology + differential geometry + tropical geometry + algebraic analysis. D-modules and Microlocal Calculus from Kashiwara and the Yoneda lemma encode all of epistemology as relational: either between objects or the interaction between objects defined as collision less Planck hyper volumes.

    basically encodes the particle-wave duality as discrete-continuum and all of epistemology is Grothendieck topoi + derived categories + functorial spaces between isometry of those dual spaces whether algebras/coalgebra (discrete modality) or homologies/cohomologies (continuous actions)

    Edit 2: The thing that ties everything together is Noether's symmetry/conserved quantities which (my own wild ass hunch) are best encoded as "modular forms", arithmetic's final mystery. The continuous symmetry I think makes it easy to think about diffeomorphisms from different topoi by extracting homeomorphisms from gauge invariant symmetries (in the discrete case it's a lattice, but in the continuous we'd have to formalize some notion of liquid or fluid bases? I think Kashiwara's crystal bases has some utility there but this is so beyond my understanding )

    • > Invert our notions of Aleph/Beth/Betti numbers as some sort of triadic Grothendieck topoi that encode our human brain's sensory instruments that nucleate discrete samples of continuum of reality (ontology)

      There’s probably ten+ years of math education encoded in this single sentence?

      17 replies →

    • You know what, since you put in all that work, here's my version using p-adic geometry to generalize the concept of time as a local relativistic "motive" (from category theory) notion of ordering (i.e. analogous to Grothendieck's generalization of functions as being point samples along curves of a basis of distributions to generalize notions of derivatives):

      https://github.com/ikrima/topos.noether/blob/aeb55d403213089...

The interesting part to me (I have a background in both math+programming) isn't so much the encoding of computation but that one can work around the independence of goodstein's theorem in this self-referential way. I think this implies that PA+"PA is omega-consistent" can prove goodstein's theorem, and perhaps can more generally do transfinite induction up to epsilon_0? EDIT: I think just PA+"PA is consistent" is enough?

  • Asker of the SO question here: I edited the question to link to a few other answers on this kind of thing. Essentially, "PA is consistent" is not enough, but a "uniform reflection principle" that says "if PA proves something, it's true" is enough. I'm not 100% certain that this principle is equivalent to omega-consistency, but if I'm reading this correctly, it should be:

    https://en.wikipedia.org/wiki/%CE%A9-consistent_theory#Relat...

    The Wikipedia article says T is omega-consistent if "T + RFN_T + the set of all true sentences is consistent", which should mean the same thing as "T + RFN_T is true".

  • I also like the recursion. In essence, you're making a meta-proof about what PA proves, and given that you trust PA, you also trust this meta-proof.

    > I think just PA+"PA is consistent" is enough?

    It's not clear to me how. I believe PA+"PA is consistent" would allow a model where Goodstein's theorem is true for the standard natural numbers, but that also contains some nonstandard integer N for which Goodstein's theorem is false. I think that's exactly the case that's ruled out by the stronger statement of ω-consistency.

  • Unfortunately not, and apparently no other purely universally quantified formulas will do either (so this is a more general thing, not specific to Con(PA)): https://math.stackexchange.com/questions/5003237/can-goodste...

    On the first question, how do you encode omega-consistency as a formula of PA? Just curious, it's not at all obvious to me.

    • > On the first question, how do you encode omega-consistency as a formula of PA? Just curious, it's not at all obvious to me.

      I was also wondering about that, but going by the Wikipedia definition, it doesn't seem too complicated: you say, "For all encoded propositions P(x): If for every x there exists an encoded proof of P(x), then there does not exist an encoded proof of 'there exists an x such that ¬P(x)'." That is, if you can prove a proposition for each integer in the metatheory, then quantifiers within the target theory must respect that.

      3 replies →

  • I think so, the math exchange post mentions that the PA + transfinite induction works on epsilon_0 proves PA. It seems likely to me that PA + PA is consistent would be able to prove transfinite induction on epsilon_0.

  • Now we're getting a little beyond the detail that I feel comfortable making statements about.

    ChatGPT tells me that PA+"PA is consistent" is not quite enough. I believe that it has digested enough logic textbooks that I'll believe that claim.

I was pretty amazed by how expressive Peano Arithmetic is when I first started using it. At first it just seems like a basic system, but once you realize that computation itself can be encoded inside PA, and that it can simulate many kinds of computation, things that felt complicated start to click.

Does anyone have recommendations for beginner-friendly resources that explain these encoding techniques?

This is very like Boyer-Moore theory[1], which builds up mathematics from the Peano axiom level.

Boyer and Moore wrote an automated theorem prover that goes with this theory. I have a working copy for GNU Common LISP at [2].

"It is perhaps easiest to think of our program much as one would think of a reasonably good mathematics student: given the axioms of Peano, he could hardly be expected to prove (much less discover) the prime factorization theorem. However, he could cope quite well if given the axioms of Peano and a list of theorems to prove (e.g., “prove that addition is commutative,” . . . “prove that multiplication distributes over addition,” . . . “prove that the result returned by the GCD function divides both of its arguments,” . . . “prove that if the products over two sequences of primes are identical, then the two sequences are permutations of one another”)." - Boyer and Moore

[1] https://www.cs.utexas.edu/~boyer/acl.pdf

[2] https://github.com/John-Nagle/nqthm/tree/master

Your comment to JoJoModding in Math Stackexchange is incorrect.

"That's because there are nonstandard models of PA which contain infinite natural numbers. So PA may be able to prove that it produces a proof, but can't prove that the proof is of finite length. And therefore it might not be a valid proof."

If PA proves "PA proves X" then PA can prove X. This is because the key observation is not that there are nonstandard models, but rather that the standard natural numbers model PA.

Therefore if PA proves "PA proves X", then there is in fact a standard, finite natural number that corresponds to the encoded proof of "PA proves X". That finite natural number can be used then to construct a proof of X in PA.

  • I haven't had the time to go through your argument in more detail, but it's important to note (because the natural language version of what you've presented is ambiguous) that you haven't shown "PA proves 'Provable(forall n, G(n))'" in which case it would be the case that in fact "PA proves 'forall n, G(n)'", but rather "PA proves 'forall n, Provable(G(n))'".

    My logic is very rusty at this point, but if someone could give me an argument that you cannot move the 'Provable' outside the 'forall', I would really appreciate that, without making reference to Goodstein sequences. In other words, that in general for propositions 'P' it is not true that proving "forall n, Provable(P(n))" implies you can prove "Provable(forall n, P(n))".

  • > If PA proves "PA proves X" then PA can prove X.

    Not true.

    From PA we can construct a function that can search all possible proofs that can be constructed in PA. In fact I outlined one way to do this at the end of my answer.

    With this function, we can construct a function will-return that analyzes whether a given function, with a given input, will return. This is kind of like an attempted solution to the Halting Problem. We know that it doesn't always work. But we also know that it works a lot of the time.

    From will-return we can create a function opposite-return that tries to return if a given function with a given input would not, and doesn't return if that function would. This construction is identical to the one in the standard proof of the Halting Problem.

    Now we consider (opposite-return opposite-return opposite-return). (Actually you need a step to expand the argument into this recursive form. I've left that out, but that is identical to the one in the standard proof of the Halting problem.)

    PA can prove the following:

    - PA proves that if PA can prove that opposite-return returns, then it doesn't. - PA proves that if PA can prove that opposite-return doesn't return, then it does. - PA proves that if it can prove everything that it proves that it can prove, then PA must have a proof of one of the two previous statements. - Therefore PA proves that if it can prove everything that it proves that it can prove, then PA is inconsistent.

    This is a form of Gödel's second incompleteness theorem.

    And, therefore, there must be a distinction to be made between "PA proves" and "PA proves that it proves".

    • I am not sure I follow and it is possible I may not be on the same wavelength, but here is another thought.

      For any sentence S, if first-order PA (or any first-order theory) proves S, then that means that S holds true in every model of that theory, via the completeness (not incompleteness) theorem.

      The statement "PA proves x" is equivalent to saying "there exists a natural number N which is a Gödel number encoding a proof of x." The letter "x", here, is a natural number that is assumed to encode some sentence we want to prove, that is, x = #S for some sentence S.

      The above is a predicate with one existential quantifier: Provable(x) = there exists N such that IsProof(N, x) holds true, where IsProof says that N is the Gödel number of a valid proof ending in x.

      If PA proves "Provable(x)", that means that the predicate "Provable(x)" holds in every model of PA. This means every model of PA would have some natural number N that satisfies IsProof(N, x). Of course, this number can vary from model to model.

      However: the standard model, which has only standard natural numbers, is also another model of PA. So if PA proves "Provable(x)," and Provable(x) thus holds in every model, it must also hold in the standard model of PA. This means that there must be a regular, standard, finite natural number N_st that encodes a proof of x.

      Since the standard model is an initial segment of every other model of PA, then every other model includes the standard model and all the standard natural numbers. Thus, if PA proves Provable(x), then the standard number N_st must exist in all models.

      So we cannot have a situation where PA proves Provable(x) but all proofs of x must be nonstandard and infinite in length. This would mean no such proof exists in the standard model of PA - but then, via completeness, PA would not be able to prove "Provable(x)".

      3 replies →

    • > > If PA proves "PA proves X" then PA can prove X.

      If we assume PA to be sound, then indeed everything it proves is true.

      > Not true.

      Now you're saying PA is unsound.

      But your article wasn't about PA proves "PA proves X", it was about "forall n : PA proves G(n)".

      For PA not to prove "forall n: G(n)", there is no soundness issue, only a ω-consistency issue.

      7 replies →

    • > PA proves that if it can prove everything that it proves that it can prove, then PA must have a proof of one of the two previous statements.

      I don't believe this is true. I don't know what result you're using here, but I think you're mixing up "provable" and "true".

      In particular your line of reasoning violates Lob's Theorem, which is a corollary of the second incompleteness theorem.

  • The standard model is a model of PA only if PA is consistent, which it cannot prove unless it is inconsistent (Godel's theorem). So your proposed proof doesn't work in PA itself, which is the point of that comment, I believe.

    • I believe my proof actually works in systems even weaker than PA (but the metatheory of how exactly to do the encoding in a first-order theory is a bit of a headache for me at the moment).

      In particular I'm not relying on the consistency of PA. If PA is inconsistent, "If PA proves 'PA proves X' then PA can prove X" also holds (trivially), because then PA can prove anything.

      6 replies →

I was talking to someone about inductive data types, and showed them the zero/succ definition of `Nat`, e.g. in Lean or Rocq.

It was interesting because they were asking "is this all you need? What about the Peano axioms? Is there anything more primitive than the inductive data type?"

I bring it up because it's good not to take for granted that the Peano axioms are inherent, instead of just one design among many.

  • > Is there anything more primitive than the inductive data type?

    I believe that the natural numbers are more primitive than inductive data types, since all inductive data types may be constructed from the natural numbers alongside a small number of primitive type formers (e.g. Π, Σ, = and Ω).

    • You don't need all the natural numbers for that, though. I think you need 0 and 1 only?

      I think there are two primitive sets for dependent type theory. The one with omega, and then the one with inductive types. None of them need axioms like the Peano axioms.

      1 reply →

Related to PA consistency, it can be proven in PA. https://youtu.be/6pjLmmkZnIA

  • This definitely needs some context for the non-logicians in the house! Gödel's second incompleteness theorem shows that, if PA can prove its own consistency, then PA is inconsistent (and can therefore prove anything, including false things).

    The work linked here doesn't show that PA is inconsistent, however: what it does is to define a new, weaker notion of what it means for PA to “prove its own consistency” and to show that PA can do that weaker thing.

    Interesting work for sure, but it won't mean anything to you unless you already know a lot of logic.

This topic has 123 points, but the linked post on SO only has 11 upvotes.

  • Stack Overflow requires you to get 15 points before you can upvote things.

    I imagine their reputation problem (nobody wants to post things there because it just gets deleted) plus the 15 point requirement precludes many people from upvoting over there.

But is computation enough? The computable reals are a subset of the reals.

  • "Reals" (tragically poorly named) can be interpreted as physical ratios.

    That is: Real numbers describe real, concrete relations. For e.g., saying that Jones weighs 180.255 pounds means there's a real, physical relationship -- a ratio -- between Jones' weight and the standard pound. Because both weights exist physically, their ratio also exists physically. Thus, from this viewpoint, real numbers can be viewed as ratios.

    In contrast, the common philosophical stance on numbers is that they are abstract concepts, detached from the actual physical process of measurement. Numbers are seen as external representations tied to real-world features through human conventions. This "representational" approach, influenced by the idea that numbers are abstract entities, became dominant in the 20th century.

    But the 20th century viewpoint is really just one interpretation (you could call it "Platonic"), and, just as it's impossible to measure ratios to infinite precision in the real world, absolutely nothing requires an incomputable continuum of reals.

    Physics least of all. In 20th and 21st century physics, things are discrete (quantized) and are very rarely measured to over 50 significant digits. Infinite precision is never allowed, and precision to 2000 significant digits is likewise impossible. The latter not only because quantum mechanics makes it impossible to attain great precision on very small scales. For e.g., imagine measuring the orbits of the planets and moons in the solar system: By the time you get to 50 significant digits, you will need to take into account the gravitational effects of the stars nearest to the sun; before you get to 100 significant digits, you'll need to model the entire Milky Way galaxy; the further you go in search of precision, the exponentially larger your mathematical canvas will need to grow, and at arbitrarily high sub-infinite precision you’d be required to model the whole of the observable universe -- which might itself be futile, as objects and phenomena outside observable space could affect your measurements, etc. So though everything is in principle simulatable, and precision has a set limit in a granular universe that can be described mathematically, measuring anything to arbitrarily high precision is beyond finite human efforts.

    • As far as we know quantum mechanics does not have a granularity. You can measure any value to arbitrary precision. You must have limits on measuring two things simultaneously.

      Granularity is implied by some, but not all, post standard model physics. It's a very open question.

      2 replies →

    • > absolutely nothing requires an incomputable continuum of reals.

      In a sense this is wrong. Some say the indispensability of modern math, which includes the incomputable reals, shows that such abstract objects are required for science.

      3 replies →

    • Numbers interpretable as ratios are the Rational numbers, by definition, not the Reals.

      This entire discussion is about mathematical concepts, not physical ones!

      Sure, yes, in physics you never "need" to go past a certain number of digits, but that has nothing to do with mathematical abstractions such as the types numbers. They're very specifically and strictly defined, starting from certain axioms. Quantum mechanics and the measurability of particles has nothing to do with it!

      It's also an open question how much precision the Universe actually has, such as whether things occur at a higher precision than can be practically measured, or whether the ultimate limit of measurement capability is the precision that the Universe "keeps" in its microscopic states.

      For example, let's assume that physics occurs with some finite precision -- so not the infinite precision reals -- and that this precision is exactly the maximum possible measurable precision for any conceivable experiment. That is: Information is matter. Okay... which number space is this? Booleans? Integers? Rationals? In what space? A 3D grid? Waves in some phase space? Subdivided... how?

      Figure that out, and your Nobel prize awaits!

      5 replies →

    • The set of values of any physical quantity must have an algebraic structure that satisfies a set of axioms that include the axioms of the Archimedean group (which include the requirements that it must be possible to compare, add and subtract the values of that physical quantity).

      This requirement is necessary to allow the definition of a division operation, which has as operands a pair of values of that physical quantity, and as result a scalar a.k.a. "real" number. This division operation, as you have noticed, is called "measurement" of that physical quantity. A value of some physical quantity, i.e. the dividend in the measurement operation, is specified by writing the quotient and the divisor of the measurement, e.g. in "6 inches", "6" is the quotient and "inch" is the divisor.

      In principle, this kind of division operation, like any division, could have its digit-generating steps executed infinitely, producing an approximation as close as desired for the value of the measured quantity, which is supposed to be an arbitrary scalar, a.k.a. "real" number. Halting the division after a finite number of steps will produce a rational number.

      In practice, as you have described, the desire to execute the division in a finite time is not the only thing that limits the precision of the measured values, but there are many more constraints, caused by the noise that could need longer and longer times to be filtered, by external influences that become harder and harder to be suppressed or accounted for, by ever greater cost of the components of the measurement apparatus, by the growing energy required to perform the measurement, and so on.

      Nevertheless, despite the fact that the results of all practical measurements are rational numbers of low precision, normally representable as FP32, with only measurements done in a few laboratories around the world, which use extremely expensive equipment, requiring an FP64 or an extended precision representation, it is still preferable to model the set of scalars using the traditional axioms of the continuous straight line, i.e. of the "real" numbers.

      The reason is that this mathematical model of a continuous set is actually much simpler than attempting to model the sets of values of physical quantities as discrete sets. An obvious reason why the continuous model is simpler is that you cannot find discretization steps that are good both for the side and for the diagonal of a square, which has stopped the attempts of the Ancient Greeks to describe all quantities as discrete. Already Aristotle was making a clear distinction between discrete quantities and continuous quantities. Working around the Ancient Greek paradox requires lack of isotropy of the space, i.e. discretization also of the angles, which brings a lot of complications, e.g. things like rigid squares or circles cannot exist.

      The base continuous dynamical quantities are the space and time, together with a third quantity, which today is really the electric voltage (because of the convenient existence of the Josephson voltage-frequency converters), even if the documents of the International System of Units are written in an obfuscated way that hides this, in an attempt to preserve the illusion that the mass might be a base quantity, like in the older systems of units.

      In any theory where some physical quantities that are now modeled as continuous, were modeled as discrete instead, the space and time would also be discrete. There have been many attempts to model the space-time as a discrete lattice, but none of them has produced any useful result. Unless something revolutionary will be discovered, all such attempts appear to be just a big waste of time.

      5 replies →

    • Your whole comment can just be TL;DR Forth and the fixed point philosophy :)

      Rationals > irrationals on computing them. You can always approximate irrationals with rationals, even Scheme (Lisp, do'h) has a function to convert a rational to decimal and the reverse. decimal to rational.

  • This is an under specified question, until some observable goal is attached to "enough". Enough for what?

  • I like to think about it like this: while real numbers in general are impossible to compute, write down or do anything else with them, many statements about real numbers can be expressed in a useful, computable form.

    It all gets mind-bendingly mind-bending really fast, especially with people like Harvey Friedman ( https://u.osu.edu/friedman.8/foundational-adventures/fom-ema... ) or the author of this post trying to break the math by constructing an untractably large values with the simplest means possible (thus showing that you can encounter problems that do not fit in the universe even when working in a "simple" theory)

    (i saw the username and went to check audiomulch website, but it did not resolve :( )

  • > But is computation enough?

    Of course not, but that would invalidate the entire project of some/many here to turn reality into something clockwork they can believe they understand. Reality is much more interesting than that.