← Back to context

Comment by nathanrf

3 months ago

Here's a high level overview for a programmer audience (I'm listed as an author but my contributions were fairly minor):

[See specifics of the pipeline in Table 3 of the linked paper]

* There are 181 million ish essentially different Turing Machines with 5 states, first these were enumerated exhaustively

* Then, each machine was run for about 100 million steps. Of the 181 million, about 25% of them halt within this memany step, including the Champion, which ran for 47,176,870 steps before halting.

* This leaves 140 million machines which run for a long time.

So the question is: do those TMs run FOREVER, or have we just not run them long enough?

The goal of the BB challenge project was to answer this question. There is no universal algorithm that works on all TMs, so instead a series of (semi-)deciders were built. Each decider takes a TM, and (based on some proven heuristic) classifies it as either "definitely runs forever" or "maybe halts".

Four deciders ended up being used:

* Loops: run the TM for a while, and if it re-enters a previously-seen configuration, it definitely has to loop forever. Around 90% of machines do this or halt, so covers most.

6.01 million TMs remain.

* NGram CPS: abstractly simulates each TM, tracking a set of binary "n-grams" that are allowed to appear on each side. Computes an over-approximation of reachable states. If none of those abstract states enter the halting transition, then the original machine cannot halt either.

Covers 6.005 million TMs. Around 7000 TMs remain.

* RepWL: Attempts to derive counting rules that describe TM configurations. The NGram model can't "count", so this catches many machines whose patterns depend on parity. Covers 6557 TMs.

There are only a few hundred TMs left.

* FAR: Attempt to describe each TM's state as a regex/FSM.

* WFAR: like FAR but adds weighted edges, which allows some non-regular languages (like matching parentheses) to be described

* Sporadics: around 13 machines had complicated behavior that none of the previous deciders worked for. So hand-written proofs (later translated into Rocq) were written for these machines.

All of the deciders were eventually written in Rocq, which means that they are coupled with a formally-verified proof that they actually work as intended ("if this function returns True, then the corresponding mathematical TM must actually halt").

Hence, all 5-states TMs have been formally classified as halting or non-halting. The longest running halter is therefore the champion- it was already suspected to be the champion, but this proves that there wasn't any longer-running 5-state TM.

In Coq-BB5 machines are not run for 100M steps but directly thrown in the pipeline of deciders. Most halting machines are detected by Loops using low parameters (max 4,100 steps) and only 183 machines are simulated up to 47M steps to deduce halting.

In legacy bbchallenge's seed database, machines were run for exactly 47,176,870 steps, and we were left with about 80M nonhalting candidates. Discrepancy comes from Coq-BB5 not excluding 0RB and other minor factors.

Also, "There are 181 million ish essentially different Turing Machines with 5 states", it is important to note that we end up with this number only after the proof is completed, knowing this number is as hard as solving BB(5).

  • Why is knowing that there are 181 M 5-state machines difficult? Is it not just (read*write*move*(state+halt))^state? About 48^5, modulo "essentially different".

    • Number of 5-state TMs is 21^10 = 16,679,880,978,201; coming from (1 + writemovestate)^2*state; difference with your formula is that in our model, halting is encoded using undefined transition.

      "essentially different" is not a statically-checked property. It is discovered by the enumeration algorithm (Tree Normal Form, see Section 3 of https://arxiv.org/pdf/2509.12337); in particular, for each machine, the algorithm needs to know it if will ever reach an undefined transition because if it does it will visit the children of that machine (i.e. all ways to set that reached undefined transition).

      Knowing if an undefined transition will be ever reached is not computable, hence knowing the number of enumerated machines is not computable in general and is as hard as solving BB(5).

      3 replies →

    • Just to directly address this - the key thing wrong in this formula is that it's ^2*state, not ^state. The state transition table operates both on the current state and the input you read from the tape, so for a binary turing machine with 5 states, you have a 10-entry transition table.

      2 replies →

    • My intuition is that this would include every 4 state (and fewer) machine that has a fifth state that is never used. For every different configuration of that fifth state I would consider the machine to be essentially the same.

Thank you for the write-down, it is very interesting!

Is there some reason why Rocq is the proof assistant of choice and not one of the others?

also.... https://www.youtube.com/watch?v=c3sOuEv0E2I

  • It is as simple as: the person who contributed the proofs/implementations chose Rocq.

    I did some small proofs in Dafny for a few of the simpler deciders (most of which didn't end up being used).

    At the time, I found Dafny's "program extraction" (i.e. transpilation to a "real" programming language) very cumbersome, and produced extremely inefficient code. I think since then, a much improved Rust target has been implemented.

  • Its focus around constructible objects does lend itself well to the kind of proofs needed for non-halting. Usually they involve lemmas about the 'syntax' of a TM's configuration and how it evolves over time, and the actual number-based math is relatively simple. (With the exception of Skelet #17, which involves fiddly invariants of finite sequences.)

So how many sporadics would be left if we run the same on 6-state TM?

  • Very many - I think the number is several thousand. Several sporadic 6-state machines have been solved, though, and there are currently about 2400(?) unsolved machines. Among these are several Cryptids, machines whose halting problem is known to be mathematically hard

    • You guys need a blog to chase these down.

      If nothing else, it’ll inspire the next generation of mathematicians.

> There is no universal algorithm that works on all TMs

Is this suspected or proven? I would love to read the proof if it exists.

  • Scooping the Loop Snooper (an elementary proof of the undecidability of the halting problem)

    No program can say what another will do. Now, I won't just assert that, I'll prove it to you: I will prove that although you might work til you drop, you can't predict whether a program will stop.

    Imagine we have a procedure called P that will snoop in the source code of programs to see there aren't infinite loops that go round and around; and P prints the word "Fine!" if no looping is found.

    You feed in your code, and the input it needs, and then P takes them both and it studies and reads and computes whether things will all end as the should (as opposed to going loopy the way that they could).

    Well, the truth is that P cannot possibly be, because if you wrote it and gave it to me, I could use it to set up a logical bind that would shatter your reason and scramble your mind.

    Here's the trick I would use - and it's simple to do. I'd define a procedure - we'll name the thing Q - that would take and program and call P (of course!) to tell if it looped, by reading the source;

    And if so, Q would simply print "Loop!" and then stop; but if no, Q would go right back to the top, and start off again, looping endlessly back, til the universe dies and is frozen and black.

    And this program called Q wouldn't stay on the shelf; I would run it, and (fiendishly) feed it itself. What behaviour results when I do this with Q? When it reads its own source, just what will it do?

    If P warns of loops, Q will print "Loop!" and quit; yet P is supposed to speak truly of it. So if Q's going to quit, then P should say, "Fine!" - which will make Q go back to its very first line!

    No matter what P would have done, Q will scoop it: Q uses P's output to make P look stupid. If P gets things right then it lies in its tooth; and if it speaks falsely, it's telling the truth!

    I've created a paradox, neat as can be - and simply by using your putative P. When you assumed P you stepped into a snare; Your assumptions have led you right into my lair.

    So, how to escape from this logical mess? I don't have to tell you; I'm sure you can guess. By reductio, there cannot possibly be a procedure that acts like the mythical P.

    You can never discover mechanical means for predicting the acts of computing machines. It's something that cannot be done. So we users must find our own bugs; our computers are losers!

    by Geoffrey K. Pullum Stevenson College University of California Santa Cruz, CA 95064

    From Mathematics Magazine, October 2000, pp 319-320.

    • In principle, analysts could could analyze all machines of size K or less, classifying many of them as being halting or not, and all the rest could in fact be non-halting, but the analyst would never (at any finite time) know for sure whether they are all non-halting.

  • This is proven. It's known as the halting problem and is the central pillar of computational complexity theory. The proof was invented by Alan Turing and is online.

  • This is just the Halting Problem, many good introductory expositions of the proof exist.

how much computing time does it take to evaluate a machine to 100 million steps on a modern epyc system?