Comment by fourthrigbt

3 months ago

Doesn’t sound like you paid all that much attention when learning ML. The curse of dimensionality doesn’t say that every problem has some ideal model size, it says that the amount of data needed to train scales with the size of the feature space. So if you take an LLM, you can make the network much larger but if you don’t increase the size of the input token vocabulary you aren’t even subject to the curse of dimensionality. Beyond that, there’s a principle in ML theory that says larger models are almost always better because the number of params in the model is the dimensionality of the space in which you’re running gradient descent and with every added dimension, local optima become rarer.

0 comments

fourthrigbt

No comments yet

Contribute on Hacker News ↗