← Back to context

Comment by kenjackson

4 hours ago

Has anyone tried creating a language that would be good for LLMs? I feel like what would be good for LLMs might not be the same thing that is good for humans (but I have no evidence or data to support this, just a hunch).

The problem with this is the reason LLMs are so good at writing Python/Java/JavaScript is that they've been trained on a metric ton of code in those languages, have seen the good the bad and the ugly and been tuned to the good. A new language would be training from scratch and if we're introducing new paradigms that are 'good for LLMs but bad for humans' means humans will struggle to write good code in it, making the training process harder. Even worse, say you get a year and 500 features into that repo and the LLM starts going rogue - who's gonna debug that?

  • But coding is largely trained on synthetic data.

    For example, Claude can fluently generate Bevy code as of the training cutoff date, and there's no way there's enough training data on the web to explain this. There's an agent somewhere in a compile test loop generating Bevy examples.

    A custom LLM language could have fine grained fuzzing, mocking, concurrent calling, memoization and other features that allow LLMs to generate and debug synthetic code more effectively.

    If that works, there's a pathway to a novel language having higher quality training data than even Python.

>Has anyone tried creating a language that would be good for LLMs?

I’ve thought about this and arrived at a rough sketch.

The first principle is that models like ChatGPT do not execute programs; they transform context. Because of that, a language designed specifically for LLMs would likely not be imperative (do X, then Y), state-mutating, or instruction-step driven. Instead, it would be declarative and context-transforming, with its primary operation being the propagation of semantic constraints. The core abstraction in such a language would be the context, not the variable. In conventional programming languages, variables hold values and functions map inputs to outputs. In a ChatGPT-native language, the context itself would be the primary object, continuously reshaped by constraints. The atomic unit would therefore be a semantic constraint, not a value or instruction.

An important consequence of this is that types would be semantic rather than numeric or structural. Instead of types like number, string, bool, you might have types such as explanation, argument, analogy, counterexample, formal_definition.

These types would constrain what kind of text may follow, rather than how data is stored or laid out in memory. In other words, the language would shape meaning and allowable continuations, not execution paths. An example:

@iterate: refine explanation until clarity ≥ expert_threshold

There are two separate needs here. One is a language that can be used for computation where the code will be discarded. Only the output of the program matters. And the other is a language that will be eventually read or validated by humans.

I don’t know rust but I use it with llms a lot as unlike python, it has fewer ways to do things, along with all the built in checks to build.

I want to create a language that allows an LLM to dynamically decide what to do.

A non dertermistic programing language, which options to drop down into JavaScript or even C if you need to specify certain behaviors.

I'd need to be much better at this though.