Comment by rsynnott

1 year ago

Note:

> without some additional prompting or fine tuning that encourages it to do something else.

That tuning has been done for all major current models, I think? Certainly, early image generation models _did_ have issues in this direction.

EDIT: If you think about it, it's clear that this is necessary; a model which only ever produces the average/most likely thing based on its training dataset will produce extremely boring and misleading output (and the problem will compound as its output gets fed into other models...).

8 comments

rsynnott

nox101 1 year ago

why is it necessary? There's 1.4 billion Chinese. 1.4 billon Indians. 1.2 billion Africans. 0.6 billion Latinos and 1 billion white people. Those numbers don't have to be perfect but nor do they have to be purely white/non-white but taken as is, they show there should be ~5 non-white nurses for every 1 white nurse. Maybe it's less, maybe more, but there's no way "white" should be the default.

terryf 1 year ago
But that depends on context. If I would ask "please make picture of Nigerian nurse" then the probability should be overwhelmingly black. If I ask for "picture of Finnish nurse" then it should be almost always a white person.
That probably can be done and may work well already, not sure.
But the harder problem is that since I'm from a country where at least 99% of nurses are white people, then for me it's really natural to expect a picture of a nurse to be a white person by default.
But for a person that's from China, a picture of a nurse is probably expected to be of a chinese person!
But if course the model has no idea who I am.
So, yeah, this seems like a pretty intractable problem to just DWIM. Then again, the whole AI thingie was an intractable problem three years ago, so...
- Scea91 1 year ago
  
  > But if course the model has no idea who I am.
  I guess if Google provided the model with the same information if uses to target ads then this would be pretty much achievable.
  However, I am not sure I'd like such personalised model. We have enough bubbles already and they don't do much good. From this perspective LLMs are refreshing by treating everyone the same as of now.
rsynnott 1 year ago

If the training data was a photo of every nurse in the world, then that’s what you’d expect, yeah. The training set isn’t a photo of every nurse in the world, though; it has a bias.
forgetfreeman 1 year ago

Honest, if controversial, question: beyond virtue signaling what problem is debate around this topic intended to solve? What are we fixing here?
bitcurious 1 year ago
If the prompt is in English it should presume an American/British/Canadian/Australian nurse, and represent the diversity of those populations. If the prompt is in Chinese, the nurses should demonstrate the diversity of the Chinese speaking people, with their many ethnicities and subcultures.
- nox101 1 year ago
  
  Searching on google images for "nurse" shows mostly non-white nurses for me. Whether google search it showing "average nurse" or it's been tuned to be diverse, it seems like gemini, made by google, should have already known how to solve this?
- rsynnott 1 year ago
  
  > If the prompt is in English it should presume an American/British/Canadian/Australian nurse, and represent the diversity of those populations.
  Don't forget India, Nigeria, Pakistan, and the Philippines, all of which have more English speakers than any of those countries but the US.