← Back to context

Comment by theappsecguy

2 days ago

Nothing about the information it feeds you is novel. It's all stolen repetition of someone else's work.

Bizarre to say that. When I have it perform work on a bespoke code base on a niche videogame, in a less commonly used language, is that still "regurgitating stuff"?

No, it is impossible for it to have seen this combination of things.

It routinely produces, suggests, and correctly implements novel things that had not existed.

You can see this yourself by learning how LLMs work, or anecdotally using these tools.

  • LLMs are terrible at generating code for “less commonly used languages”. They require LOTS of data for high accuracy.

    I describe it this way: they are good at interpolating from what data they were trained on, but terrible at extrapolating. I agree with the parent that the LLM-generated content isn’t novel, it’s just a rehash of two things it was trained on.

    • I have wasted quite a number of hours trying to use LLMs to write things for less common languages. Sure they can one-shot some impressive stuff in C#, Python, and JavaScript… but try working in Object Pascal: it’s non-obvious hallucination after non-obvious hallucination, presented confidently enough to make it difficult to see it’s complete garbage, so you waste a ton of time trying to polish a turd.

      2 replies →

That is simply not true. The naive “glorified auto-complete / stochastic parrot” argument may have some merit when applied to generic pre-trained models, which only learn from unsupervised next-token prediction. But the post training through reinforcement learning the frontier models undergo is very sophisticated and they genuinely learn to do novel things that are purely the work of the model being trained (and the work of the GPUs they burn along the way of course).

Thank god I bought the alphabet before learning it unlike one of those stealing heathens.

In your hate of AI please don't build the world in The Right to Read.