Comment by davebren

6 days ago

In effect the source code is being copied by the LLM. This is what it's designed to do. LLMs are a lossy statistical compression of their training data.

If you give it a prompt telling it to replicate a product that's in its training set then its optimal next token prediction output is going to be to a lossy copy of that product's source code.