Comment by nathan_compton

1 day ago

Really classic "rationalist" style writing: a soup of correct observations about statistical phenomena with chunks of weird political bullshit thrown in here and there. For example: "On a more contemporary note, these theoretical & empirical considerations also throw doubt on concerns about ‘algorithmic bias’ or inferences drawing on ‘protected classes’: not drawing on them may not be desirable, possible, or even meaningful."

This is such a bizarre sentence. The way its tossed in, not explained in any way, not supported by references, etc. Like I guess the implication being made is something like "because there is a hidden latent variable that determines criminality and we can never escape from correlations with it, its ok to use "is_black" in our black box model which decides if someone is going to get parole? Ridiculous. Does this really "throw doubt" on whether we should care about this?

The concerns about how models work are deeper than the statistical challenges of creating or interpreting them. For one thing, all the degrees of freedom we include in our model selection process allow us to construct models which do anything that we want. If we see a parole model which includes "likes_hiphop" as an explanatory variable we ought to ask ourselves who decided that should be there and whether there was an agenda at play beyond "producing the best model possible."

These concerns about everything being correlated actually warrant much more careful understanding about the political ramifications of how and what we choose to model and based on which variables, because they tell us that in almost any non-trivial case a model is at least partly necessarily a political object almost certainly consciously or subconsciously decorated with some conception of how the world is or ought to be explained.

> This is such a bizarre sentence. The way its tossed in, not explained in any way,

It reads naturally in context and is explained by the foregoing text. For example, the phrase "these theoretical & empirical considerations" refers to theoretical and empirical considerations described above. The basic idea is that, because everything correlates with everything else, you can't just look at correlations and infer that they're more than incidental. The political implications are not at all "weird", and follow naturally. The author observes that social scientists build complex models and observe huge amounts of variables, which allows them to find correlations that support their hypothesis; but these correlations, exactly because they can be found everywhere, are not anywhere near as solid evidence as they are presented as being.

> Like I guess the implication being made is something like "because there is a hidden latent variable that determines criminality and we can never escape from correlations with it, its ok to use "is_black" in our black box model which decides if someone is going to get parole?

No, not at all. The implication is that we cannot conclude that the black box model actually has an "is_black" variable, even if it is observed to have disparate impact on black people.

  • Sorry, but I don't think that is a reasonable read. The phrase "not drawing on them may not be desirable, possible, or even meaningful" is a political statement except perhaps for "possible," which is just a flat statement that its hard to separate causal variables from non-causal ones.

    Nothing in the statistical observation that variables tend to be correlated suggests we should somehow reject the moral perspective that that its desirable for a model to be based on causal rather than merely correlated variables, even if finding such variables is difficult or even, impossible to do perfectly. And its certainly also _meaningful_ to do so, even if there are statistical challenges. A model based on "socioeconomic status" has a totally different social meaning than one based on race, even if we cannot fully disentangle the two statistically. He is mixing up statistical and social, moral and even philosophical questions in a way which is, in my opinion, misleading.

    • Or maybe your own announced bias against “rationalists” is affecting your reading of this. I agree with GPs interpretation.

    • >its desirable for a model to be based on causal rather than merely correlated variables

      Ironically, your "likes_hiphop" example would appear to be an unusually clean case of a variable that is likely to exert causal influence.

      What do you think the causal effect of listening to lyrics like "Prolly leave my fuckin' show in a cop car" might be, on an impressionable teenage boy say?

      From one of the most-streamed hip-hop songs of all time:

      https://genius.com/Post-malone-rockstar-lyrics

      https://newsroom.spotify.com/2024-05-20/best-hip-hop-songs-1...

      >A model based on "socioeconomic status" has a totally different social meaning than one based on race, even if we cannot fully disentangle the two statistically.

      I see no evidence Gwern disagrees with this claim. He just seems to be arguing the "cannot fully disentangle the two statistically" part.

      2 replies →

    • > Nothing in the statistical observation that variables tend to be correlated suggests we should somehow reject the moral perspective that that its desirable for a model to be based on causal rather than merely correlated variables, even if finding such variables is difficult or even, impossible to do perfectly.

      Perfect is the enemy of good. That it would be desirable to construct a model based on causal variables is self-evident, but we don't have those, and if a correlative model can demonstrably improve people's material conditions, even if conditioned on variables you find "distasteful", what is your argument that such a model shouldn't be used?

      1 reply →

"Rationalists" do seem to have a fetish for ranking people and groups of people. Oddly enough, they frequently use poorly performed studies and under-powered data to reach their conclusions about genetics and IQ especially.

> For example: "On a more contemporary note, these theoretical & empirical considerations also throw doubt on concerns about ‘algorithmic bias’ or inferences drawing on ‘protected classes’: not drawing on them may not be desirable, possible, or even meaningful."

As much as I do think that good, parsimonious social science modeling _requires_ theoretical commitments, the test is whether TFA would say the same thing about political cause du jour - say, `is_white` in hiring in an organization that does outreach to minority communities.

Yes this is gwern to a "T". Overwhelm with a r/iamverysmart screed whilst insidiously inserting baseless speculation and opinion as fact as if the references provided cover those too. Weirdly the scaling/AI community loves him.