← Back to context

Comment by zwaps

4 hours ago

Scale changes the performance of LLMs.

Sometimes, we go so far as to say there is "emergence" of qualitative differences. But really, this is not necessary (and not proven to actually occur).

What is true is that the performance of LLMs at OOD tasks changes with scale.

So no, it's not the same as solving a math problem.

> What is true is that the performance of LLMs at OOD tasks changes with scale.

If scaling alone guaranteed strong OOD generalization, we’d expect the largest models to consistently top OOD benchmarks but this isn’t the case. In practice, scaling primarily increases a model’s capacity to represent and exploit statistical relationships present in the training distribution. This reliably boosts in-distribution performance but yields limited gains on tasks that are distributionally distant from the training data, especially if the underlying dataset is unchanged. That’s why trillion parameter models trained on the same corpus may excel at tasks similar to those seen in training, but won’t necessarily show proportional improvements on genuinely novel OOD tasks.