Comment by krapp

2 years ago

You may be working from the false assumption that the data set itself is balanced by demographics. It isn't the case that x% of images of doctors on the web are white because the same percent of doctors are white, it's the case that most images of doctors are white because the image of a doctor (or any educated person) as being white by default is normalized by Western (specifically American) society, and this prejudice is reflected in the data generated for the internet that makes up the model.

Regardless of the statistics, it should be just as easy to generate the image of a white doctor as a black doctor. Both queries are straightforward and make linguistic sense. It doesn't follow that an image of a black doctor should be more difficult to create because statistically speaking, black doctors are more rare. That the model has trouble even comprehending the concept of a "black doctor," much less something like a "black African doctor treating white kids[0]" is a problem rooted in the effect of racial stereotypes, albeit at several levels of abstraction above that of the software itself.

[0]https://www.npr.org/sections/goatsandsoda/2023/10/06/1201840...