Comment by tibbar

4 years ago

Different coefficients help you look at different kinds of relationships. For example, Pearson's R tells you about linear relationships between variables -- it's closely tied to the covariance: "how useful is it to draw a line through these data points, and how accurate is interpolating likely to be?".

Spearman's correlation helps you understand monotonic/rank-order relationships between variables: "Is there a trend where increasing X tends to also increase Y?" (This way we can be just as good at measuring the existence of linear, logarithmic, or exponential relationships, although we can't tell them apart.)

Mutual information helps you understand how similar two collections of data are, in the sort of unstructured way that's useful in building decision trees. You could have high mutual information without any sort of linear or monotonic relationship at all. Thus it's more general while at the same time not telling you anything that would be helpful in building, for instance, a predictive multivariate linear model.

TLDR; More specific coefficients leverage assumptions about the structure of the data (eg linearity), which can help you construct optimal versions of models under those assumptions. Mutual information doesn't make any assumptions about the structure of the data so it won't feed into such a model, but it still has lots of applications!

0 comments

tibbar

No comments yet

Contribute on Hacker News ↗