Comment by eru

1 month ago

> Also I think at some point the halting problem will make some programs impossible to test.

No, not at all. The halting problem isn't much of a problem here.

To elaborate: yes, it's pretty much impossible to decide whether an arbitrary programme will halt. But we aren't dealing with arbitrary programmes, you carefully have your agent craft programmes that are easy to prove correct.

There are languages available today whose type systems already only let you write terminating programmes. See eg https://news.ycombinator.com/item?id=32102203 the Dhall language. Or Agda or Lean itself (unless you specifically opt out via the 'partial' keyword. But it's trivial to check whether someone used 'partial'.)

If your agent write a programme that's not easy to prove to be terminating, you don't try harder to prove. You just flag that as an error and have the agent try again.

Just like as a human code reviewer you reject Pull Requests that are too complicated to understand: you don't even bother figuring out whether they are technically correct or not.

2 comments

eru

f1shy 1 month ago

Thanks. I expressed it with doubt, because I assume there had to be some way around.

If I understood correctly, the halting problem states you cannot make a program to test any program. But you can do if you test a relatively small, possibly finite and well defined subset of programs, right?

eru 1 month ago

> If I understood correctly, the halting problem states you cannot make a program to test any program. But you can do if you test a relatively small, possibly finite and well defined subset of programs, right?
In the standard theories everything becomes trivial, if you go finite.
The trick is not so much that you have a well-defined subset, but that you allow your programme analyser to say 'Eh, I don't know'.
The halting problem is impossible to solve in general, if you analyser has to answer "Definitely halts" or "Definitely runs forever" for any given programme you present it with. If you give a third option "Eh, I don't know", you can solve the problem.
Trivially you might always say "Eh, I don't know" and never be wrong. But we can write useful analysers that try their best not to say "Eh, I don't know".
One example is type checkers in compilers. When they detect that you are eg trying to add numbers and strings, then something is definitely wrong with your programme. But when they let your programme through, it doesn't mean your programme is flawless.
The opposite is what you get with prove assistants like we were discussing: if they prove your programme correct, it's definitely according to the specs. But if the fail, that doesn't mean that your programme is definitely incorrect; they might have just failed to find a proof.
There doesn't need to be a need and simple definition of programmes your theorem prover works vs fails on. Eg if you upgrade the AI agent that helps you find the proof (or hire a smarter human to drive it) they might suddenly find a proof where they struggled previously.