Comment by yummyfajitas

13 years ago

One practical application of this stuff is in understanding high dimensional discriminants.

Consider two populations, A and B, with N real-valued traits. Suppose each trait in group A is normally distributed with mean 0 and stdev=10, while group B is distributed with mean 1 and stdev=10. (This is true for trait 1, trait 2, etc.)

Each individual trait in these groups overlaps quite drastically. Imagining that all the mass of the normal distribution is contained in a ball of radius 20, then for any single trait A lives on the ball of radius 20 about 0 (namely [-20,20]) while B lives on [-19, 21]. Barely different, right?

On the other hand, in the N-dimensional space, the point (0,0,...,0) has a distance sqrt(N) from the point (1,1,...,1). So in 401-dimensional space, the ball of radius 20 around (0,0,...,0) and the ball of radius 20 around (1,1,...,1) don't overlap at all and the discriminant f(traits)=sign(trait[1] + trait[2] + ... + trait[N]) works fantastically.

This is one reason why "big data" can work well - chaining together many weak predictors gets you a strong predictor.

See Lewontin's Fallacy for a biological example of this.

https://en.wikipedia.org/wiki/Lewontins_Fallacy

2 comments

yummyfajitas

adolgert 13 years ago

I like that explanation, coming from the data side. I wonder if the article's author is working with data or a model for the data. There is a general rule that, especially for high-dimensional models, only a few parameters are important because eigenvalues of the sensitivity matrix fall off quickly (with logarithmic density). That means the data is well-separate, as you described, and only a few parameters control whether the model fits, while the rest are relatively unimportant. It's a different kind of pointy.

adolgert 13 years ago