Comment by jonas21

1 day ago

> At this point it seems pretty clear that LLMs are having a major impact on how software is built, but for almost every other industry the practical effects are mostly incremental.

Even just a year ago, most people thought the practical effects in software engineering were incremental too. It took another generation of models and tooling to get to the point where it could start having a large impact.

What makes you think the same will not happen in other knowledge-based fields after another iteration or two?

> most people thought the practical effects in software engineering were incremental too

Hum... Are you saying it's having clear positive (never mind "transformative") impact somewhere? Can you point any place we can see observable clear positive impact?

  • I know many companies that have replaced Customer Support agents with LLM-based agents. Replacing support with AI isn't new, but what is new is that the LLM-based ones have higher CSAT (customer satisfaction) rates than the humans they are now replacing (ie, it's not just cost anymore... It's cost and quality).

    • Well I as a Customer who had to deal with AI bots as Customer Service have significantly lower Customer Satisfaction. Because I don't wanna deal with some clanker. Who doesn't realy understand what I am talking about.

  • It doesn’t need to provide “ observable clear positive impact”. As long as the bosses think it improves numbers, it will be used. See offshoring or advertising everywhere.

Software is more amenable to LLMs because there is a rich source of highly relevant training data that corresponds directly to the building blocks of software, and the "correctness" of software is quasi-self-verifiable. This isn't true for pretty much anything else.

  • The more verifiable the domain the better suited. We see similar reports of benefits from advanced mathematics research from Terrence Tao, granted some reports seem to amount to very few knew some data existed that was relevant to the proof, but the LLM had it in its training corpus. Still, verifiably correct domains are well-suited.

    So the concept formal verification is as relevant as ever, and when building interconnected programs the complexity rises and verifiability becomes more difficult.

    • > The more verifiable the domain the better suited.

      Absolutely. It's also worth noting that in the case of Tao's work, the LLM was producing Lean and Python code.

    • I think the solution in harder-to-verify cases is to provide AI (sub-)agents a really good set of instructions on a detailed set of guidelines of what it should do and in what ways it should think and explore and break down problems. Potentially tens of thousands of words of instructions to get the LLM to act as a competent employee in the field. Then the models need to be good enough at instruction-following to actually explore the problem in the right way and apply basic intelligence to solve it. Basically treating the LLM as a competent general knowledge worker that is unfamiliar with the specific field, and giving it detailed instructions on how to succeed in this field.

      For the easy-to-verify fields like coding, you can train "domain intuitions" directly to the LLM (and some of this training should generalize to other knowledge work abilities), but for other fields you would need to supply them in the prompt as the abilities cannot be trained into the LLM directly. This will need better models but might become doable in a few generations.

      1 reply →

  • Presumably at some point capability will translate to other domains even if the exchange rate is poor. If it can autonomously write software and author CAD files then it can autonomously design robots. I assume everything else follows naturally from that.

    • > If it can autonomously write software and author CAD files then it can autonomously design robots.

      It can't because the LLM can't test its own design. Unlike with code, the LLM can't incrementally crawl its way to a solution guided by unit tests and error messages. In the real world, there are material costs for trial and error, and there is no CLI that allows every aspect of the universe to be directly manipulated with perfect precision.

      1 reply →