← Back to context

Comment by root_axis

1 day ago

Software is more amenable to LLMs because there is a rich source of highly relevant training data that corresponds directly to the building blocks of software, and the "correctness" of software is quasi-self-verifiable. This isn't true for pretty much anything else.

The more verifiable the domain the better suited. We see similar reports of benefits from advanced mathematics research from Terrence Tao, granted some reports seem to amount to very few knew some data existed that was relevant to the proof, but the LLM had it in its training corpus. Still, verifiably correct domains are well-suited.

So the concept formal verification is as relevant as ever, and when building interconnected programs the complexity rises and verifiability becomes more difficult.

  • > The more verifiable the domain the better suited.

    Absolutely. It's also worth noting that in the case of Tao's work, the LLM was producing Lean and Python code.

  • I think the solution in harder-to-verify cases is to provide AI (sub-)agents a really good set of instructions on a detailed set of guidelines of what it should do and in what ways it should think and explore and break down problems. Potentially tens of thousands of words of instructions to get the LLM to act as a competent employee in the field. Then the models need to be good enough at instruction-following to actually explore the problem in the right way and apply basic intelligence to solve it. Basically treating the LLM as a competent general knowledge worker that is unfamiliar with the specific field, and giving it detailed instructions on how to succeed in this field.

    For the easy-to-verify fields like coding, you can train "domain intuitions" directly to the LLM (and some of this training should generalize to other knowledge work abilities), but for other fields you would need to supply them in the prompt as the abilities cannot be trained into the LLM directly. This will need better models but might become doable in a few generations.

    • > I think the solution in harder-to-verify cases is to provide AI (sub-)agents a really good set of instructions on a detailed set of guidelines of what it should do and in what ways it should think and explore and break down problems

      Using LLMs to validate LLMs isn't a solution to this problem. If the system can't self-verify then there's no signal to tell the LLM that it's wrong. The LLM is fundamentally unreliable, that's why you need a self-verifying system to guide and constrain the token generation.

Presumably at some point capability will translate to other domains even if the exchange rate is poor. If it can autonomously write software and author CAD files then it can autonomously design robots. I assume everything else follows naturally from that.

  • > If it can autonomously write software and author CAD files then it can autonomously design robots.

    It can't because the LLM can't test its own design. Unlike with code, the LLM can't incrementally crawl its way to a solution guided by unit tests and error messages. In the real world, there are material costs for trial and error, and there is no CLI that allows every aspect of the universe to be directly manipulated with perfect precision.

    • You don't need perfect precision, just a sufficiently high fidelity simulation. For example hypersonic weapon design being carried out computationally was the historical reason (pre AI) to restrict export of certain electronics to China.

      OpenAI demoed training a model for a robotic hand using this approach years ago.