← Back to context

Comment by sega_sai

3 days ago

The least squares and pca minimize different loss functions. One is sum of squares of vertical(y) distances, another is is sum of closest distances to the line. That introduces the differences.

"...sum of squared distances to the line" would be a better description. But it also depends entirely on how covariance is estimated

That makes sense. Why does least squares skew the line downwards though (Vs some other direction)? Seems arbitrary

  • The Pythagorean distance would assume that some of the distance (difference) is on the x axis, and some on the y axis, and the total distance is orthogonal to the fitted line.

    OLS assumes that x is given, and the distance is entirely due to the variance in y, (so parallel to the y axis). It’s not the line that’s skewed, it’s the space.

  • I think it has to do with the ratio of \Sigma_xx, \Sigma_yy. I don't have time to verify that, but it should be easy to check analytically.

I find it helpful to view least as fitting the noise to a Gaussian distribution.

  • They both fit Gaussians, just different ones! OLS fits a 1D Gaussian to the set of errors in the y coordinates only, whereas TLS (PCA) fits a 2D Gaussian to the set of all (x,y) pairs.

    • Well, that was a knowledge gap, thank you! I certainly need to review PCA but python makes it a bit too easy.

  • OLS estimator is the minimum-variance linear unbiased estimator even without the assumption of Gaussian distribution.

    • Yes, and if I remember correctly, you get the Gaussian because it's the minimum entropy (least additional assumptions about the shape) continuous distribution given a certain variance.

      1 reply →

  • Both of these do, in a way. They just differ in which gaussian distribution they're fitting to.

    And how I suppose. PCA is effectively moment matching, least squares is max likelihood. These correspond to the two ways of minimizing the Kullback Leibler divergence to or from a gaussian distribution.