Comment by sega_sai

2 months ago

The least squares and pca minimize different loss functions. One is sum of squares of vertical(y) distances, another is is sum of closest distances to the line. That introduces the differences.

11 comments

sega_sai

anArbitraryOne 2 months ago

"...sum of squared distances to the line" would be a better description. But it also depends entirely on how covariance is estimated

CGMthrowaway 2 months ago

That makes sense. Why does least squares skew the line downwards though (Vs some other direction)? Seems arbitrary

mr_toad 2 months ago

The Pythagorean distance would assume that some of the distance (difference) is on the x axis, and some on the y axis, and the total distance is orthogonal to the fitted line.
OLS assumes that x is given, and the distance is entirely due to the variance in y, (so parallel to the y axis). It’s not the line that’s skewed, it’s the space.
sega_sai 2 months ago

I think it has to do with the ratio of \Sigma_xx, \Sigma_yy. I don't have time to verify that, but it should be easy to check analytically.

ryang2718 2 months ago

I find it helpful to view least as fitting the noise to a Gaussian distribution.

MontyCarloHall 2 months ago
They both fit Gaussians, just different ones! OLS fits a 1D Gaussian to the set of errors in the y coordinates only, whereas TLS (PCA) fits a 2D Gaussian to the set of all (x,y) pairs.
- ryang2718 2 months ago
  
  Well, that was a knowledge gap, thank you! I certainly need to review PCA but python makes it a bit too easy.
LudwigNagasena 2 months ago
OLS estimator is the minimum-variance linear unbiased estimator even without the assumption of Gaussian distribution.
- rjdj377dhabsn 2 months ago
  
  Yes, and if I remember correctly, you get the Gaussian because it's the minimum entropy (least additional assumptions about the shape) continuous distribution given a certain variance.
  
  1 reply →
contravariant 2 months ago

Both of these do, in a way. They just differ in which gaussian distribution they're fitting to.
And how I suppose. PCA is effectively moment matching, least squares is max likelihood. These correspond to the two ways of minimizing the Kullback Leibler divergence to or from a gaussian distribution.