When the correlation is close to 0 it's often because of a feedback loop.
For example - in economy with central bank trying to hit inflation target - interest rates and inflation will have near 0 correlation (interest rates change but inflation remains constant). That's because central bank adjusts interest rates to counter other variables so that inflation remains near the target.
Other example (my favorite, it was mindblowing when my teacher showed it to us on econometrics as a warning :) ) - gas pedal and speed of a car driving on a hilly road. Driver wants to drive near the speed limit, so he adjusts the gas pedal to keep the speed constant. Simplistic conclusion would be - speed is constant despite the gas pedal position changing therefore they are unrelated :)
That's a good point. Another one I forgot to make: given the established empirical reality of 'everything is correlated', if you find a variable which does in fact seem to be independent of most or everything else, that alone makes that variable suspicious - it suggests that it may be a pseudo-variable, composed largely or entirely of measurement error/randomness, or perhaps afflicted by a severe selection bias or other problem (such as range restriction or Berkson's paradox eliminating the real correlation).
Somewhat similarly, because 'everything is heritable', if you run into a human trait which is not heritable at all and is precisely estimated at h^2~0, that cast considerable doubt on whether you have a real trait at all. (I've seen this happen to a few latent variables extracted by factor analysis: they have near-zero heritability in a twin study and on further investigation, turn out to have been just sampling error or bad factor analysis in the first place, and don't replicate or predict anything or satisfy any of the criteria you might use to decide if a trait is 'real'.)
That's very interesting. In the car driving example we can define three variables: 1) Throttle 2) Speed 3) Elevation derivative
If "3" is constant (ex: flat terrain) then "1" and "2" will have strong correlation. However if "2" is constant (ex: cruise control) as in your example, "1" and "3" will have strong correlation.
In the economic example, however, this kind of analisys should be much more complex and take plenty of variables into account.
> Other example (my favorite, it was mindblowing when my teacher showed it to us on econometrics as a warning :) ) - gas pedal and speed of a car driving on a hilly road. Driver wants to drive near the speed limit, so he adjusts the gas pedal to keep the speed constant. Simplistic conclusion would be - speed is constant despite the gas pedal position changing therefore they are unrelated :)
I think that's Milton Friedman's Thermostat in case you want to search for it.
Good discussion. On the flip side, in my data mining class the professor keeps saying ~"you may be able to find clusters in a data set, but often no true correlation exists." However, that's an absolute statement I just don't swallow. In my mind what I see is that if an unexplained correlation or non-correlation appears, it may be random (or true) or it could be the result of an unmeasured (hidden) variable. In your two examples, your simply pointing out two respective hidden variables that weren't accounted for in the original analysis.
I think any data analysis should always be caveated with the understanding that there may be hidden variables shrouding or perhaps enhancing correlations - from economics to quantum mechanics. It's up to the reviewer of the results to determine, subjectively or by using a standard measure, whether the level of rigor involved in data collection & analysis sufficiently models reality.
Perhaps they are trying to explain clustering illusion? The phenomenon that even random data will produce clusters. You can take that further and state random data WILL produce clusters. If you don't have clusters then your data is not random and some pattern is at play.
This really tricks up our mind as our mind tries to find patterns everywhere. If you try and plot random dots you will usually put dots without clusters. A true random plot will have clusters.
The ability for adults to drink milk and fluency in speaking English is well correlated. This is because those of northern European ancestry are more likely to be able to drink milk, and it happens that most of northern European ancestry either immigrated to an English speaking country (US) 150 years ago, or are in a country where English instruction is good.
It's probably in the same vein as the classic quote by Tukey "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." - a "need" for an answer can easily motivate people to mangle the data in order to find it even if it doesn't exist.
I like the gas pedal example. I read a similar one somewhere where we measure temperature inside the house and energy usage by the heater. The energy usage is correlated with outside temperature, but inside temperature stays constant, so we conclude that inside temperature is unrelated to heater and turn the heater off.
I really like that example, but I am wondering if it would really be true? Real drivers would not maintain a perfect speed, but would instead work to maintain the average. If you looked closely at the speed, it would drift away from the average, then the peddle would move to return it to the average. So it would look a bit like an integral (The I in PID control) of the difference from the mean speed right?
Yup, that's how you know I only had this on university, never used it in real life :) I think in real life you might see the feedback loop in motion, or not, depending on the resolution and sampling.
It is true that, as Fisher points out, with enough samples you are almost guaranteed to reject the null hypothesis. That's why we tell students to consider both p values (which you could think of as a form of quality control on the dataset) and variance explained. Loftus and Loftus make the point nicely: p tells you if you have enough samples and any effect to consider, variance explained tells you if it's worth pursuing. Both are useful guides to a thoughtful analysis. In addition, I'd make a case for thinking about the scientific significance and importance of the hypothesis and the Bayesian prior. And to put a positive spin on this, given how easy it is to get small p values, big ones are pretty much a red flag to stop the analysis and go and do something more productive instead.
Agree that NHST using simple null hypothesis of the form
H0: μ = 0
doesn't provide much value. H0 is never true, and the conclusion of "rejecting H0" based on a p-value is therefore not super profound. Also "rejecting H0" conclusion doesn't really tells anything about the alternative hypothesis HA (not even considered when computing p-value, since p-value is under H0). Dichotomies in general are bad, but NHST with point H0 is useless!
However a composite hypothesis setup of the form
H0: μ ≤ 0
HA: μ > 0
is probabilistically sound (in as much as some journal requires you to report a p-values). Much better to report effect size estimate and/or CI.
That still gives 50-50 odds with sufficient sample size, not much of a test of the research hypothesis (since many alternatives will predict the same direction). It is better than 100% chance of rejection though.
Couldn't you make an argument that the point H0 has use when you are testing whether two populations are identical? i.e. it's probably true that \mu is very close to 0 if it is the difference in heights of men from Nebraska vs men from Iowa.
You've kind of hit the point with the second half of your comment. Two populations are virtually never identical, so you don't need any statistics to answer the question. A more reasonable question is whether or not you have the statistical power (i.e. measurement precision) to see the difference, and whether the difference is big enough to matter.
"Drawing on GWAS analyses of three diseases, they concluded that in the cell types that are relevant to a disease, it appears that not 15, not 100, but essentially all genes contribute to the condition. The authors suggested that for some traits, “multiple” loci could mean more than 100,000."
I think a major issue here is that, perhaps, there is a tendency to want to use statistics to decide what the 'truth' is, because it takes the onus of responsibility for making a mistake away from the interpreter. Its nice to be able to stand behind a p-value and not be accountable for whatever argument is being made. But the issue here, is that most any argument can be made in a large enough dataset, and a careful analyst will find significance.
This is of course the case only if one does not venture far from the principal assumptions of frequentism, most of which are routinely violated outside of almost every example except pure random number generation and fundamental quantum physics.
So a central issue that isn't addressed in STATS101 level hypothesis testing is the impact that the question has on the result. Its almost inevitable that people want to interpret a failure to reject as a positive result. But a p-value really doesn't tell you if its a useful result; but rather, your sample size is big enough to detect a difference.
Statistical significance is something that can be calculated. Practical significance is something that needs to be interpreted.
I think this article is trying to tie two things together, the p-value problem and the fact you can throw in more data.
I disagree.
It's cheating, it's goes against experimental design analysis, and it does not differentiate between given data and data that was carefully collected. We have experimental design class for a reason. It helps us to be honest. Of course there are tons of pit falls many novice statisticians can do.
It also implicitly leads people to think that statistic can magically handle given data and big data by doing the old fashion statistic way. If you do that than of course you'll get a good p-value.
> It's cheating, it's goes against experimental design analysis, and it does not differentiate between given data and data that was carefully collected. We have experimental design class for a reason. It helps us to be honest. Of course there are tons of pit falls many novice statisticians can do.
Explicit sequential testing runs into exactly the same problem. The problem is, the null hypothesis is not true. So no matter whether you use fixed (large) sample sizes or adaptive procedures which can terminate early while still preserving (the irrelevant) nominal false-positive error rates, you will at some sample size reject the null as your power approaches 100%.
This is mostly right, but you are still thinking of these rejections as "false positives" for some reason. They are real deviations from the null hypothesis ("true positives"). The problem is the user didn't test the null model they wanted, it is 100% user error.
>"The fact that these variables are all typically linear or additive further implies that interactions between variables will be typically rare or small or both (implying that most such hits will be false positives, as interactions are far harder to detect than main effects)."
Where does this "fact" come from? And if everything is correlated with everything else all these effects are true positives...
Also, another ridiculous aspect of this is that when data becomes cheap the researchers just make the threshold stricter so it doesn't become too easy. They are (collectively) choosing what is "significant" or not and then acting like "significant" = real and "non-significant" = 0.
Finally, I didn't read through the whole thing. Does he claim to have found an exception to this rule at any point?
> Finally, I didn't read through the whole thing. Does he claim to have found an exception to this rule at any point?
Oakes 1975 points out that explicit randomized experiments, which test a useless intervention such as school reform, can be exceptions. (Oakes might not be quite right here, since surely even useless interventions have some non-zero effect, if only by wasting peoples' time & effort, but you might say that the 'crud factor' is vastly smaller in randomized experiments than in correlational data, which is a point worth noting.)
Is this trying to be too clever? If the correlation is weaker than the random noise of the data, then it is equivalent to not being correlated.
Otherwise, we'd get conclusions like the color of your car influencing your risk of lung cancer or some such nonsense. With enough data, you could see a weak correlation of red car to cancer, but it would still be insignificant. That's what the null-hypothesis is for: to put a treshold under which we can just ignore whatever weak correlation seems to be there.
Question: Are these correlations typically transitive? That is to say, does it typically happen that in addition to everything having nonzero correlation with everything else, it additionally happens that the sign of the correlation between A and C is equal to the product of the signs of the correlations between A and B and between B and C?
Thorndike's dictum would suggest that this is so, at least in that particular domain. What about more generally?
You're being downvoted because you missed the point repeatedly made in the intro and many of the excerpts that this is in fact a claim about the 'true means'.
Cause, like, when you start learning about systems, everything is correlated, everything is connected, everything is linked, and you have to point it all out to everyone all the time.
When the correlation is close to 0 it's often because of a feedback loop.
For example - in economy with central bank trying to hit inflation target - interest rates and inflation will have near 0 correlation (interest rates change but inflation remains constant). That's because central bank adjusts interest rates to counter other variables so that inflation remains near the target.
Other example (my favorite, it was mindblowing when my teacher showed it to us on econometrics as a warning :) ) - gas pedal and speed of a car driving on a hilly road. Driver wants to drive near the speed limit, so he adjusts the gas pedal to keep the speed constant. Simplistic conclusion would be - speed is constant despite the gas pedal position changing therefore they are unrelated :)
That's a good point. Another one I forgot to make: given the established empirical reality of 'everything is correlated', if you find a variable which does in fact seem to be independent of most or everything else, that alone makes that variable suspicious - it suggests that it may be a pseudo-variable, composed largely or entirely of measurement error/randomness, or perhaps afflicted by a severe selection bias or other problem (such as range restriction or Berkson's paradox eliminating the real correlation).
Somewhat similarly, because 'everything is heritable', if you run into a human trait which is not heritable at all and is precisely estimated at h^2~0, that cast considerable doubt on whether you have a real trait at all. (I've seen this happen to a few latent variables extracted by factor analysis: they have near-zero heritability in a twin study and on further investigation, turn out to have been just sampling error or bad factor analysis in the first place, and don't replicate or predict anything or satisfy any of the criteria you might use to decide if a trait is 'real'.)
That's very interesting. In the car driving example we can define three variables: 1) Throttle 2) Speed 3) Elevation derivative
If "3" is constant (ex: flat terrain) then "1" and "2" will have strong correlation. However if "2" is constant (ex: cruise control) as in your example, "1" and "3" will have strong correlation.
In the economic example, however, this kind of analisys should be much more complex and take plenty of variables into account.
The key point being identifying those variables and ensuring they remain constant (i.e. in that example - tire pressure, elevation, fuel load etc.)
> Other example (my favorite, it was mindblowing when my teacher showed it to us on econometrics as a warning :) ) - gas pedal and speed of a car driving on a hilly road. Driver wants to drive near the speed limit, so he adjusts the gas pedal to keep the speed constant. Simplistic conclusion would be - speed is constant despite the gas pedal position changing therefore they are unrelated :)
I think that's Milton Friedman's Thermostat in case you want to search for it.
Good discussion. On the flip side, in my data mining class the professor keeps saying ~"you may be able to find clusters in a data set, but often no true correlation exists." However, that's an absolute statement I just don't swallow. In my mind what I see is that if an unexplained correlation or non-correlation appears, it may be random (or true) or it could be the result of an unmeasured (hidden) variable. In your two examples, your simply pointing out two respective hidden variables that weren't accounted for in the original analysis.
I think any data analysis should always be caveated with the understanding that there may be hidden variables shrouding or perhaps enhancing correlations - from economics to quantum mechanics. It's up to the reviewer of the results to determine, subjectively or by using a standard measure, whether the level of rigor involved in data collection & analysis sufficiently models reality.
Perhaps they are trying to explain clustering illusion? The phenomenon that even random data will produce clusters. You can take that further and state random data WILL produce clusters. If you don't have clusters then your data is not random and some pattern is at play.
This really tricks up our mind as our mind tries to find patterns everywhere. If you try and plot random dots you will usually put dots without clusters. A true random plot will have clusters.
https://en.wikipedia.org/wiki/Clustering_illusion
Edit: Note your professor said "often" which means they did not make an absolute statement
6 replies →
The ability for adults to drink milk and fluency in speaking English is well correlated. This is because those of northern European ancestry are more likely to be able to drink milk, and it happens that most of northern European ancestry either immigrated to an English speaking country (US) 150 years ago, or are in a country where English instruction is good.
It's probably in the same vein as the classic quote by Tukey "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." - a "need" for an answer can easily motivate people to mangle the data in order to find it even if it doesn't exist.
I like the gas pedal example. I read a similar one somewhere where we measure temperature inside the house and energy usage by the heater. The energy usage is correlated with outside temperature, but inside temperature stays constant, so we conclude that inside temperature is unrelated to heater and turn the heater off.
I really like that example, but I am wondering if it would really be true? Real drivers would not maintain a perfect speed, but would instead work to maintain the average. If you looked closely at the speed, it would drift away from the average, then the peddle would move to return it to the average. So it would look a bit like an integral (The I in PID control) of the difference from the mean speed right?
Yup, that's how you know I only had this on university, never used it in real life :) I think in real life you might see the feedback loop in motion, or not, depending on the resolution and sampling.
Well, we're obviously talking about perfectly spherical drivers (good point though).
1 reply →
Pressing the brake is positively correlated with the car going faster. Down hill.
Good thing correlation is not an indicator of causation.
It is true that, as Fisher points out, with enough samples you are almost guaranteed to reject the null hypothesis. That's why we tell students to consider both p values (which you could think of as a form of quality control on the dataset) and variance explained. Loftus and Loftus make the point nicely: p tells you if you have enough samples and any effect to consider, variance explained tells you if it's worth pursuing. Both are useful guides to a thoughtful analysis. In addition, I'd make a case for thinking about the scientific significance and importance of the hypothesis and the Bayesian prior. And to put a positive spin on this, given how easy it is to get small p values, big ones are pretty much a red flag to stop the analysis and go and do something more productive instead.
> "It is true that, as Fisher points out, with enough samples you are almost guaranteed to reject the null hypothesis. "
Where does Fisher point this out?
> "That's why we tell students to consider both p values (which you could think of as a form of quality control on the dataset)"
How is this "quality control"? It just tells you whether your sample size was large enough to pass an arbitrary threshold...
> Where does Fisher point this out?
Probably in the Fisher excerpt.
1 reply →
Agree that NHST using simple null hypothesis of the form
doesn't provide much value. H0 is never true, and the conclusion of "rejecting H0" based on a p-value is therefore not super profound. Also "rejecting H0" conclusion doesn't really tells anything about the alternative hypothesis HA (not even considered when computing p-value, since p-value is under H0). Dichotomies in general are bad, but NHST with point H0 is useless!
However a composite hypothesis setup of the form
is probabilistically sound (in as much as some journal requires you to report a p-values). Much better to report effect size estimate and/or CI.
That still gives 50-50 odds with sufficient sample size, not much of a test of the research hypothesis (since many alternatives will predict the same direction). It is better than 100% chance of rejection though.
Couldn't you make an argument that the point H0 has use when you are testing whether two populations are identical? i.e. it's probably true that \mu is very close to 0 if it is the difference in heights of men from Nebraska vs men from Iowa.
You've kind of hit the point with the second half of your comment. Two populations are virtually never identical, so you don't need any statistics to answer the question. A more reasonable question is whether or not you have the statistical power (i.e. measurement precision) to see the difference, and whether the difference is big enough to matter.
This reminds me of the current omnigenic hypothesis about genes. That unexpectedly almost every gene seems to affect the expression of traits.
https://www.quantamagazine.org/omnigenic-model-suggests-that...
"Drawing on GWAS analyses of three diseases, they concluded that in the cell types that are relevant to a disease, it appears that not 15, not 100, but essentially all genes contribute to the condition. The authors suggested that for some traits, “multiple” loci could mean more than 100,000."
That is just a special case of the "everything is correlated" principle.
I think a major issue here is that, perhaps, there is a tendency to want to use statistics to decide what the 'truth' is, because it takes the onus of responsibility for making a mistake away from the interpreter. Its nice to be able to stand behind a p-value and not be accountable for whatever argument is being made. But the issue here, is that most any argument can be made in a large enough dataset, and a careful analyst will find significance.
This is of course the case only if one does not venture far from the principal assumptions of frequentism, most of which are routinely violated outside of almost every example except pure random number generation and fundamental quantum physics.
So a central issue that isn't addressed in STATS101 level hypothesis testing is the impact that the question has on the result. Its almost inevitable that people want to interpret a failure to reject as a positive result. But a p-value really doesn't tell you if its a useful result; but rather, your sample size is big enough to detect a difference.
Statistical significance is something that can be calculated. Practical significance is something that needs to be interpreted.
I think this article is trying to tie two things together, the p-value problem and the fact you can throw in more data.
I disagree.
It's cheating, it's goes against experimental design analysis, and it does not differentiate between given data and data that was carefully collected. We have experimental design class for a reason. It helps us to be honest. Of course there are tons of pit falls many novice statisticians can do.
It also implicitly leads people to think that statistic can magically handle given data and big data by doing the old fashion statistic way. If you do that than of course you'll get a good p-value.
> It's cheating, it's goes against experimental design analysis, and it does not differentiate between given data and data that was carefully collected. We have experimental design class for a reason. It helps us to be honest. Of course there are tons of pit falls many novice statisticians can do.
Explicit sequential testing runs into exactly the same problem. The problem is, the null hypothesis is not true. So no matter whether you use fixed (large) sample sizes or adaptive procedures which can terminate early while still preserving (the irrelevant) nominal false-positive error rates, you will at some sample size reject the null as your power approaches 100%.
This is mostly right, but you are still thinking of these rejections as "false positives" for some reason. They are real deviations from the null hypothesis ("true positives"). The problem is the user didn't test the null model they wanted, it is 100% user error.
6 replies →
>"The fact that these variables are all typically linear or additive further implies that interactions between variables will be typically rare or small or both (implying that most such hits will be false positives, as interactions are far harder to detect than main effects)."
Where does this "fact" come from? And if everything is correlated with everything else all these effects are true positives...
Also, another ridiculous aspect of this is that when data becomes cheap the researchers just make the threshold stricter so it doesn't become too easy. They are (collectively) choosing what is "significant" or not and then acting like "significant" = real and "non-significant" = 0.
Finally, I didn't read through the whole thing. Does he claim to have found an exception to this rule at any point?
> Finally, I didn't read through the whole thing. Does he claim to have found an exception to this rule at any point?
Oakes 1975 points out that explicit randomized experiments, which test a useless intervention such as school reform, can be exceptions. (Oakes might not be quite right here, since surely even useless interventions have some non-zero effect, if only by wasting peoples' time & effort, but you might say that the 'crud factor' is vastly smaller in randomized experiments than in correlational data, which is a point worth noting.)
Thanks,
How about this "fact": The fact that these variables are all typically linear or additive?
2 replies →
Is this trying to be too clever? If the correlation is weaker than the random noise of the data, then it is equivalent to not being correlated.
Otherwise, we'd get conclusions like the color of your car influencing your risk of lung cancer or some such nonsense. With enough data, you could see a weak correlation of red car to cancer, but it would still be insignificant. That's what the null-hypothesis is for: to put a treshold under which we can just ignore whatever weak correlation seems to be there.
Question: Are these correlations typically transitive? That is to say, does it typically happen that in addition to everything having nonzero correlation with everything else, it additionally happens that the sign of the correlation between A and C is equal to the product of the signs of the correlations between A and B and between B and C?
Thorndike's dictum would suggest that this is so, at least in that particular domain. What about more generally?
Like a background radiation, we have an "absolute background" correlation value...a value we might test against e.g. |+/- .02321|
Or we could drop the null
REJECT THE NULL HYPOTHESIS !!! :-)
It's well known that the number of Nicholas Cage movies is correlated with a wide variety of natural phenomena.
Sample means and true means are different things.
You're being downvoted because you missed the point repeatedly made in the intro and many of the excerpts that this is in fact a claim about the 'true means'.
Cause, like, when you start learning about systems, everything is correlated, everything is connected, everything is linked, and you have to point it all out to everyone all the time.