← Back to context

Comment by sillysaurusx

3 hours ago

It’s easy to see that LLMs don’t merely recombine their training data. Claude can program in Arc, a mostly dead language. It can also make use of new language constructs. So either all programming language constructs are merely remixes of existing ideas, or LLMs are capable of working in domains where no training data exists.

LLMs ingest and output tokens, but they don’t compute with them. They have internal representations of concepts, so they have some capability to work with things which they didn’t see but can map onto what they know. The surprise and the whole revolution we’re going through is that it works so well.

  • > they don’t compute with them

    Isn't this exactly what chain-of-thought does? It's doing computation by emitting tokens forward into its context, so it can represent states wider than its residuals and so it can evaluate functions not expressed by one forward pass through the weights. It just happens to look like a person thinking out loud because those were the most useful patterns from the training data.

They recombine and reuse the patterns in their training data, not the surface level training data itself.

An LLM generating Arc code is using the LISP patterns it learnt from training, maybe patterns from other programming languages too.

> So either all programming language constructs are merely remixes of existing ideas, or LLMs are capable of working in domains where no training data exists.

And yet LLM/AIs can't count parentheses reliably.

For example, if you take away the "let" forms from Claude which forces it to desugar them to "lambda" forms, it will fail very quickly. This is a purely mechanical transformation and should be error free. The significant increase in ambiguity complete stumps LLMs/AI after about 3 variables.

This is why languages like Rust with strong typing and lots of syntax are so LLM friendly; it shackles the LLM which in turn keeps it on target.