Comment by keepamovin

22 days ago

Your hypothetical misses praxis: in my experience LLM can pick up any new syntax with ease. From a few examples, it can generate more. With a compiler (even partial on limited syntax), it can correct. It soon becomes fluent simply from the context of your codebase. You don't need to "train" an LLM to recognize language syntax. It's effortless for it to pick it up.

Or, maybe my lanng just had LLM-easy syntax - which would be good - but I think this is more just par for the course for LLMs, bud.

5 comments

keepamovin

fragmede 21 days ago

Syntax != semantics. The LLM being able to adhere syntax is one thing, the LLM picking up various language-isms is another. In Python, you don't want to see

    for i in range(len(arr)): 
        something(arr[i])

because the pythonic way is simply:

    for i in arr:
        something(i)

or even:

    [something(i) for i in arr]

The first version is absolutely syntactically correct, but terrible python. How do you teach the LLM that?

Bugs don't come from syntax errors. If you've got a syntax error, it doesn't compile/fails to run entirely. So we're not talking about the LLM learning the syntax, I'm asking the LLM learning the deeper semantics of lanng.

mike_hearn 22 days ago

I'm also looking at this topic right now.

I think you're right within limits but the issue is semantics and obscure features. If the language differs from existing languages in only trivial ways, then LLMs can pick it up quickly. But then the value of such a language is trivial. If you deviate in bigger ways, it's harder to properly use just based on pre-existing code.

Here's a simple case study: Kotlin is semantically Java with a more concise syntax, but part of what makes it more concise is the Kotlin standard library adds a lot of utility methods to Java. Many utility methods are only needed rarely. LLMs can write competent Kotlin because they read the user guide and saw millions of examples in their training set, but if they were trying to learn exclusively from small examples in their context window, they wouldn't know about those obscure utilities and would never use them. Much of the benefit would be lost.

Given this, I see a few ways forward:

1. Just give up on designing new programming languages. Languages are user interfaces but the user is now an LLM with near infinite patience, so who cares if they aren't ideal. If the LLM has to brute force a utility method every single time instead of using a standard library... ok. Whatever. This would parallel what happened with CPU ISAs. There are very few of them today, they don't matter much and they're designed in ways that only machines can handle all the details, because everyone codes to higher level languages and compilers write all the assembly.

2. Define new languages as a delta on top of some well known initial language, ensuring that the language definition always fits inside a prompt as a skill. In this world we don't bother with new syntaxes anymore unless that syntax change encodes significant new semantics, because it's not worth wasting tokens showing the LLM what to do. Everything is just an extension to Python, in this world. The line between new languages and new libraries becomes increasingly blurred as runtimes get more powerful and flexible.

3. New languages have to come with their own fine tuned and hosted coding LLM. Maybe that's even a way to monetize new language creation.

4. The big model firms offer a service where you can pay to get your data into the training set. Then you use the giant prompt+delta mechanism to get an LLM to generate a textbook of sample code, pay to get it into the training set, wait six months for another foundation model run and then your language becomes usable.

Of these I think (2) is currently the most practical.

keepamovin 22 days ago
This sounds academic, like a thought experiment. I have experience and can tell you this is not the case. I am using a significantly different language and the LLMs have 0 problem using it.
There's likely challenges here, but it's not the ones you're seeing so far.
- mike_hearn 22 days ago
  
  OK, that's valuable to know, but how is your language different? You were discussing syntax previously. How well does the LLM handle your language's different standard library and how big is it?
  
  1 reply →