Comment by ajuc
6 years ago
When the correlation is close to 0 it's often because of a feedback loop.
For example - in economy with central bank trying to hit inflation target - interest rates and inflation will have near 0 correlation (interest rates change but inflation remains constant). That's because central bank adjusts interest rates to counter other variables so that inflation remains near the target.
Other example (my favorite, it was mindblowing when my teacher showed it to us on econometrics as a warning :) ) - gas pedal and speed of a car driving on a hilly road. Driver wants to drive near the speed limit, so he adjusts the gas pedal to keep the speed constant. Simplistic conclusion would be - speed is constant despite the gas pedal position changing therefore they are unrelated :)
That's a good point. Another one I forgot to make: given the established empirical reality of 'everything is correlated', if you find a variable which does in fact seem to be independent of most or everything else, that alone makes that variable suspicious - it suggests that it may be a pseudo-variable, composed largely or entirely of measurement error/randomness, or perhaps afflicted by a severe selection bias or other problem (such as range restriction or Berkson's paradox eliminating the real correlation).
Somewhat similarly, because 'everything is heritable', if you run into a human trait which is not heritable at all and is precisely estimated at h^2~0, that cast considerable doubt on whether you have a real trait at all. (I've seen this happen to a few latent variables extracted by factor analysis: they have near-zero heritability in a twin study and on further investigation, turn out to have been just sampling error or bad factor analysis in the first place, and don't replicate or predict anything or satisfy any of the criteria you might use to decide if a trait is 'real'.)
That's very interesting. In the car driving example we can define three variables: 1) Throttle 2) Speed 3) Elevation derivative
If "3" is constant (ex: flat terrain) then "1" and "2" will have strong correlation. However if "2" is constant (ex: cruise control) as in your example, "1" and "3" will have strong correlation.
In the economic example, however, this kind of analisys should be much more complex and take plenty of variables into account.
The key point being identifying those variables and ensuring they remain constant (i.e. in that example - tire pressure, elevation, fuel load etc.)
> Other example (my favorite, it was mindblowing when my teacher showed it to us on econometrics as a warning :) ) - gas pedal and speed of a car driving on a hilly road. Driver wants to drive near the speed limit, so he adjusts the gas pedal to keep the speed constant. Simplistic conclusion would be - speed is constant despite the gas pedal position changing therefore they are unrelated :)
I think that's Milton Friedman's Thermostat in case you want to search for it.
Good discussion. On the flip side, in my data mining class the professor keeps saying ~"you may be able to find clusters in a data set, but often no true correlation exists." However, that's an absolute statement I just don't swallow. In my mind what I see is that if an unexplained correlation or non-correlation appears, it may be random (or true) or it could be the result of an unmeasured (hidden) variable. In your two examples, your simply pointing out two respective hidden variables that weren't accounted for in the original analysis.
I think any data analysis should always be caveated with the understanding that there may be hidden variables shrouding or perhaps enhancing correlations - from economics to quantum mechanics. It's up to the reviewer of the results to determine, subjectively or by using a standard measure, whether the level of rigor involved in data collection & analysis sufficiently models reality.
Perhaps they are trying to explain clustering illusion? The phenomenon that even random data will produce clusters. You can take that further and state random data WILL produce clusters. If you don't have clusters then your data is not random and some pattern is at play.
This really tricks up our mind as our mind tries to find patterns everywhere. If you try and plot random dots you will usually put dots without clusters. A true random plot will have clusters.
https://en.wikipedia.org/wiki/Clustering_illusion
Edit: Note your professor said "often" which means they did not make an absolute statement
Ipso factum all "natural" variables are related to bounded random walk which produces clusters (Markovian process), or otherwise have complex chaotic (e.g. fractal) mechanics, which also produces clusters. This follows from physics.
Maximum entropy as well as zero entropy is a very rare state to observe.
4 replies →
>"The phenomenon that even random data will produce clusters."
You don't really mean "random", you mean i.i.d. You can have a statistical model where the probability of something happens is random, but not independent of the past values (eg, the next step a markov chain).
The ability for adults to drink milk and fluency in speaking English is well correlated. This is because those of northern European ancestry are more likely to be able to drink milk, and it happens that most of northern European ancestry either immigrated to an English speaking country (US) 150 years ago, or are in a country where English instruction is good.
It's probably in the same vein as the classic quote by Tukey "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." - a "need" for an answer can easily motivate people to mangle the data in order to find it even if it doesn't exist.
I like the gas pedal example. I read a similar one somewhere where we measure temperature inside the house and energy usage by the heater. The energy usage is correlated with outside temperature, but inside temperature stays constant, so we conclude that inside temperature is unrelated to heater and turn the heater off.
I really like that example, but I am wondering if it would really be true? Real drivers would not maintain a perfect speed, but would instead work to maintain the average. If you looked closely at the speed, it would drift away from the average, then the peddle would move to return it to the average. So it would look a bit like an integral (The I in PID control) of the difference from the mean speed right?
Yup, that's how you know I only had this on university, never used it in real life :) I think in real life you might see the feedback loop in motion, or not, depending on the resolution and sampling.
Well, we're obviously talking about perfectly spherical drivers (good point though).
Yes I figured :) Just pointing out that reality is always a bit more nuanced. To put it more simply, the petal position could be seen as an error accumulator.
Pressing the brake is positively correlated with the car going faster. Down hill.
Good thing correlation is not an indicator of causation.