Comment by zug_zug

2 years ago

I don't see it as something to be angry about. Probably what happened is it was trained on some crappy stock images where every "doctor" was a white model and they are trying to negate that propensity to repeat the stereotype.

For what it's worth if I ask it to draw doctors in Uganda/Siberia/Mexico/Sweden it has 0 problem drawing a bunch of doctors all of the same race if you really need an image of that.

Is it stereotype or statistics? If indeed x% of doctors are white, then that same amount should ideally be represented in the output, not "equal probability". Seek to change the cause, not to mask the effect.

  • But then it gets crazy. If I ask for a basketball player then should it be a black player with certain probability? But HS and NBA have very different distributions. And Euro League has a very different distribution than the NBA and the CBL on China, even moreso.

  • You may be working from the false assumption that the data set itself is balanced by demographics. It isn't the case that x% of images of doctors on the web are white because the same percent of doctors are white, it's the case that most images of doctors are white because the image of a doctor (or any educated person) as being white by default is normalized by Western (specifically American) society, and this prejudice is reflected in the data generated for the internet that makes up the model.

    Regardless of the statistics, it should be just as easy to generate the image of a white doctor as a black doctor. Both queries are straightforward and make linguistic sense. It doesn't follow that an image of a black doctor should be more difficult to create because statistically speaking, black doctors are more rare. That the model has trouble even comprehending the concept of a "black doctor," much less something like a "black African doctor treating white kids[0]" is a problem rooted in the effect of racial stereotypes, albeit at several levels of abstraction above that of the software itself.

    [0]https://www.npr.org/sections/goatsandsoda/2023/10/06/1201840...

  • > then that same amount should ideally be represented in the output

    Why? Why should representation in the output reflect actual distributions of race?

    • I doubt anyone cares if you asked ChatGPT to create a picture of a basketball player and it returned an image of an asian player.

      People don't like that it's rewriting prompts to force diversity. So if I ask for a black basketball player, it should return an image of exactly that.

    • This is a good question.

      If I'm asking for quicksort, do I want the most common implementations or do I want an underrepresented impl?

      If I'm asking for the history of Egypt, do I want the most common tellings or do I want some underrepresented narrative?

      I suppose something like the race of a doctor in some Dalle image ends up being a very special case in the scheme of things, since it's a case where we don't necessarily care.

      Maybe the steelman of the idea that you shouldn't special case it can be drawn along these lines, too. But I think to figure that out you'd need to consider examples along the periphery that aren't so obvious unlike "should a generated doctor be black?"

  • India and china alone made sure that the majority of doctors are not white.

    But obviously x% of doctors are white.

  • The statistics (in the sample data) becomes the stereotype. If 99% of your samples are white doctors, maybe you will get a non-white doctor if you ask it to generate a picture of a group of 100 doctors. But if you ask it to generate a picture of a single doctor? It will generate a white one, 100% of the time, because each time the most probable skin color is white. Unless we tell it to inject some randomness, which is what the prompt is doing.

  • But... the "effect" is part of the cause...

    • You don't know that though.

      And there's evidence to the contrary. If you look at the career choices of women, to pick one contentious social issue at random, they tend to be different than the career choices of men, even in countries with a long history of gender equality.

      So if I ask ChatGPT to make me a picture of trash collectors or fishermen, it shouldn't rewrite my query to force x% of them to be women.

      9 replies →

  • > then that same amount should ideally be represented in the output, not "equal probability".

    Yeah but breaking down the actual racial distributions by career/time/region is a waste of time for people building AGI, so they threw it the prompt and moved on to more important work.

  • Do you want to be right or do you want to make money? OpenAI wants to make money, so they’ll choose the output that will do that.

If you can ask it for a doctor of $race and get one, then why should it make any difference what gets generated when you don't specify? Once you start playing that game there's no way to win.

  • Because it's not what their customers want.

    • Kinda citation needed time.

      It's not implausible that most people using ChatGPT want something socially neutral (by some definition) without needing to think about it too hard.

      Whether or not you think this is true, it's certainly plausible - so if you disagree then you should probably back up your claim.