Comment by pvillano
3 months ago
What's the goal of this article?
There exists a problem in real life that you can solve in the simple case, and invoke a theorem in the general case.
Sure, it's unintuitive that I shouldn't go all in on the smallest variance choice. That's a great start. But, learning the formula and a proof doesn't update that bad intuition. How can I get a generalizable feel for these types of problems? Is there a more satisfying "why" than "because the math works out"? Does anyone else find it much easier to criticize others than themselves and wants to proofread my next blog post?
Here's my intuition: you can reduce the variance of a measurement by averaging multiple independent measurements. That's because when they're independent, the worst-case scenario of the errors all lining up is pretty unlikely. This is a slightly different situation, because the random variable aren't necessarily measurements of a single quantity, but otherwise it's pretty similar, and the intuition about multiple independent errors being unlikely to all line up still applies.
Once you have that intuition, the math just tells you what the optimal mix is, if you want to minimize the variance.
This all hinges on the fact the variance is homogeneous to X^2, not X. If we look at the standard deviation instead, we have the expect homogeneity: stddev(tX) = abs(t) stddev(X). However, it is *not linear*, rather stddev(sum t_i X_i) = sqrt(sum t_i stddev(X_i)) assuming independent variables.
Quantitatively speaking, t^2 and (1-t)^2 are always < 1 iff |t| < 1 and t != 0. As such, the standard deviation of a convex combination of variables is *always strictly smaller* than the convex combination of the standard deviations of the variables. In other words, stddev(sum_i t_i X_i) < sum_i t_i stddev(X_i) for all t != 0, |t|<1.
What this means in practice is that the convex combination (that is, with positive coeffs < 1) of any number of random variables is always smaller than the standard deviation of any of those variables.
> Sure, it's unintuitive that I shouldn't go all in on the smallest variance choice.
Is it?
You have ten estimates of some distance with similar accuracy of the order of 10m : you take the average (and reduce the error by more than half).
If you increase the precision of one measure by 1% you will disregard all the others?