← Back to context

Comment by nicklecompte

1 year ago

"an idea that can't be expressed as a combination of existing concepts."

The problem is that if an LLM hasn't been pretrained on the specific idea, it won't have a grasp of the what the correct concepts are to make the combination. It will be liable to substitute more "statistically likely" concepts, but since that statistic is based on a training set where the concept didn't exist, its estimation of "likely" is flawed.

One good example is patents: https://nitter.net/mihirmahajan/status/1731844283207229796 LLMs can imitate appropriate prose, but really struggle to maintain semantic consistency when handling new patents for inventions that, by definition, wouldn't have appeared in the training set. But this extends to almost any writing: if you are making especially sophisticated or nuanced arguments, LLMs will struggle to rephrase them accurately.

(Note that GPT-4 is still extremely bad at document summarization, even for uninteresting documents: your Q3 PnL number is not something that appeared in the training set, and GPT-4 is liable to screw it up by substituting a "statistically likely" number.)

In my experience GPT-3.5 is extremely bad at F#: although it can do simple tasks like "define a datatype that works for such-and-such," it is much less proficient at basic functional programming in F# than it is Haskell - far more likely to make mistakes, or even identifiably plagiarize from specific GitHub repos (even my own). That's because there's a ton of Functional Programming 101 tutorials in Haskell, but very few in F#. I am not sure about GPT-4. It does seem better but I haven't tested it as extensively.

The problem is that if an LLM hasn't been pretrained on the specific idea, it won't have a grasp of the what the correct concepts are to make the combination

And this isn't true of humans?