Comment by iLemming

1 day ago

> LLM-s will stop writing human-readable code, as this is additional obstacle, they will work directly with binaries

LLMs already internally operate on embeddings - dense floating-point vectors in high-dimensional space - they don't need "intermediary language" - there's no discrete symbolic reasoning - it's continuous numerical computation all the way through.

If you made an LLM to generate bytecode, you'd need it to "read" bytecode tokens as context for predicting the next bytecode token. The model would need to have learned the statistical patterns of bytecode sequences, which are far less structured and too arbitrary than (human readable) source code. Bytecode is optimized for machine execution, not for having patterns for a next-token predictor.

So Lisp actually is the sweet spot in the direction you're gesturing at, but you got the direction backwards. You don't want to go lower (toward bytecode), you want to go toward representations that have: fewer syntactic rules, structural, and high in local predictability. That is basically Lisp. The AST is the syntax. There's no parsing ambiguity, minimal syntactic variation, and the structure is self-describing. The model spends fewer tokens on syntactic ceremony and more on semantic content. That is why LLMs are surprisingly great at generating Elisp and Clojure code.

0 comments

iLemming

No comments yet

Contribute on Hacker News ↗