Comment by spadufed

2 years ago

> Is there some sense in which this isn't obvious to the point of triviality?

This is maybe a pedantic "yes", but is also extremely relevant to the outstanding performance we see in tasks like programming. The issue is primarily the size of the correct output space (that is, the output space we are trying to model) and how that relates to the number of parameters. Basically, there is a fixed upper bound on the amount of complexity that can be encoded by a given number of parameters (obvious in principle, but we're starting to get some theory about how this works). Simple systems or rather systems with simple rules may be below that upper bound, and correctness is achievable. For more complex systems (relative to parameters) it will still learn an approximation, but error is guaranteed.

I am speculating now, but I seriously suspect the size of the space of not only one or more human language but also every fact that we would want to encode into one of these models is far too big a space for correctness to ever be possible without RAG. At least without some massive pooling of compute, which long term may not be out of the question but likely never intended for individual use.

If you're interested, I highly recommend checking out some of the recent work around monosemanticity for what fleshing out the relationship between model-size and complexity looks like in the near term.