Comment by joshwa

4 days ago

The thing about language models is that they are *language* models. They don't actually parse XML structure, or turn code into an AST, they are just next-token generators.

Individual models may have supplemented their training with things that look like structure (e.g. Claude with its XMLish delimiters), but it's far from universal.

Ultimately if we want better fidelity to the concepts we're referencing, we're better off working from the larger/richer dataset of token sequences in the training data--the total published written output of humanity.