Comment by choeger

1 year ago

Great comment. Just one thought: Language, unlike weather, is meta-circular. All we know about specific words or sentences is again encoded in words and sentences. So the embedding encodes a subset of human knowledge.

Hence, a LLM is predicting not only language but language with some sort of meaning.

That re-embeding is also encoded in weather. It is why perfect forecasting is impossible, why we talk about the butterfly effect.

The "hallucination problem" is simply the tyranny of Lorenz... one is not sure if a starting state will have a good outcome or swing wildly. Some good weather models are based on re-runing with tweaks to starting params, and then things that end up out of bounds can get tossed. Its harder to know when a result is out of bounds for an LLM, and we dont have the ability to run every request 100 times through various models to get an "average" output yet... However some of the reuse of layers does emulate this to an extent....