← Back to context

Comment by FromTheFirstIn

4 days ago

Isn’t the limit exactly what you’re describing? There’s always uncertainty, and your asymptote can approach its limit but it does have a limit. That’s the limit to the intelligence. And this is just for cross entropy loss- even if you could get loss to 0, I’m still not convinced at all that an enormous semantic map and its convoluted geometries amounts to intelligence.

If you get to E you have generated a Bayes-optimal model of the conditional distribution (as in, next token conditional on context). This is something I thought too, but even if you're a fraction of a nat above the floor, you could have enormous headroom in performance left because there are still rare tokens amongst the irreducible noise that require so much capability to predict. It's not to suggest there truly is no cap on capability, but just that this constant isn't really saying what that is.

  • Yeah, it not a linear cap (x% entropy doesn’t mean x% wrong) but it does seem like a hard cap. To be honest, the more I’ve understood scaling laws the more I think that the elephant in the LLM room is the entropy of the language. It explains why coding languages are so much more tractable (they’ve got WAY less entropy) and it explains why we haven’t seen a step function in capabilities for LLMs since GPT-4 outside of making specific toolings for particular contexts. I think E is coming to dominate and there isn’t a workaround for it.