Comment by janalsncm

6 hours ago

This is a really solid writeup. LLMs are way too verbose in prose and code, and my suspicion is this is driven mainly by the training mechanism.

Cross entropy loss steers towards garden path sentences. Using a paragraph to say something any person could say with a sentence, or even a few precise words. Long sentences are the low perplexity (low statistical “surprise”) path.