← Back to context

Comment by mbando

3 months ago

This also touches on the contrast between how human beings and LLM's trade compression for nuance. Human beings have enormous resources devoted to long-tailed distribution of information, for example in lexical items. Word distributions follow Zipf's Law, so like in the million word FROWN corpus, roughly half the words only occur one time. Like when's the last time you use the word chrysanthemum, or corpulent? But did you have any difficulty recognizing them? So while human beings have limited scale compared to machines, we do have an enormous capacity for nuanced, communication and conception.

Whereas LLM's make the opposite trade-off. There are information centric theory limitations on the amount of information LM's can store (roughly 3.6 bits per parameter) so they aggressively compress information and trade away nuance (https://arxiv.org/abs/2505.17117).