← Back to context

Comment by D13Fd

1 year ago

> If you ask generative AI for a picture of a "nurse", it will produce a picture of a white woman 100% of the time, without some additional prompting or fine tuning that encourages it to do something else.

> If you ask a generative AI for a picture of a "software engineer", it will produce a picture of a white guy 100% of the time, without some additional prompting or fine tuning that encourages it to do something else.

What should the result be? Should it accurately reflect the training data (including our biases)? Should we force the AI to return results in proportion to a particular race/ethnicity/gender's actual representation in the workplace?

Or should it return results in proportion to their representation in the population? But the population of what country? The results for Japan or China are going to be a lot different than the results for the US or Mexico, for example. Every country is different.

I'm not saying the current situation is good or optimal. But it's not obvious what the right result should be.

This is a much more reasonable question, but not the problem Google was facing. Google's AI was simply giving objectively wrong responses in plainly black and white scenarios, pun intended? None of the Founding Father's was black, and so making one of them black is plainly wrong. Google's interpretation of "US senator from the 1800s" includes exactly 0 people that would even remotely plausibly fit the bill; instead it offers up an Asian man and 3 ethnic women, including one in full-on Native American garb. It's just a completely garbage response that has nothing to do with your, again much more reasonable, question.

Rather than some deep philosophical question, I think output that doesn't make one immediately go "Erm? No, that's completely ridiculous." is probably a reasonable benchmark for Google to aim for, and for now they still seem a good deal away.

  • The problem you’re describing is that AI models have no reliable connection to objective reality. This is a shortcoming of our current approach to generative AI that is very well known already. For example Instacart just launched an AI recipe generator that lists ingredients that literally do not exist. If you ask ChatGPT for text information about the U.S. founding fathers, you’ll sometimes get false information that way as well.

    This is in fact why Google had not previously released generative AI consumer products despite years of research into them. No one, including Google, has figured out how to bolt a reliable “truth filter” in front of the generative engine.

    Asking a generative AI for a picture of the U.S. founding fathers should not involve any generation at all. We have pictures of these people and a system dedicated to accuracy would just serve up those existing pictures.

    It’s a different category of problem from adjusting generative output to mitigate bias in the training data.

    It’s overlapping in a weird way here but the bottom line is that generative AI, as it exists today, is just the wrong tool to retrieve known facts like “what did the founding fathers look like.”

    • The problem you’re describing is that AI models have no reliable connection to objective reality.

      That is a problem, but not the problem here. The problem here is that the humans at Google are overriding the training data which would provide a reasonable result. Google is probably doing something similar to OpenAI. This is from the OpenAI leaked prompt:

      Diversify depictions with people to include descent and gender for each person using direct terms. Adjust only human descriptions.

      Your choices should be grounded in reality. For example, all of a given occupation should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes.

      Use all possible different descents with equal probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have equal probability.

      2 replies →

    • This is the entire problem. What we need is a system that is based on true information paired with AI. For instance, if a verified list of founding fathers existed, the AI should be compositing an image based on that verified list.

      Instead, it just goes "I got this!" and starts fabricating names like a 4 year old.

  • "US senator from the 1800s" includes Hiram R. Revels, who served in office 1870 - 1871 — the Reconstruction Era. He was elected by the Mississippi State legislature on a vote of 81 to 15 to finish a term left vacant. He also was of Native American ancestry. After his brief term was over he became President of Alcorn Agricultural and Mechanical College.

    https://en.wikipedia.org/wiki/Hiram_R._Revels

This is a hard problem because those answers vary so much regionally. For example, according to this survey about 80% of RNs are white and the next largest group is Asian — but since I live in DC, most of the nurses we’ve seen are black.

https://onlinenursing.cn.edu/news/nursing-by-the-numbers

I think the downside of leaving people out is worse than having ratios be off, and a good mitigation tactic is making sure that results are presented as groups rather than trying to have every single image be perfectly aligned with some local demographic ratio. If a Mexican kid in California sees only white people in photos of professional jobs and people who look like their family only show up in pictures of domestic and construction workers, that reinforces negative stereotypes they’re unfortunately going to hear elsewhere throughout their life (example picked because I went to CA public schools and it was … noticeable … to see which of my classmates were steered towards 4H and auto shop). Having pictures of doctors include someone who looks like their aunt is going to benefit them, and it won’t hurt a white kid at all to have fractionally less reinforcement since they’re still going to see pictures of people like them everywhere, so if you type “nurse” into an image generator I’d want to see a bunch of images by default and have them more broadly ranged over age/race/gender/weight/attractiveness/etc. rather than trying to precisely match local demographics, especially since the UI for all of these things needs to allow for iterative tuning in any case.

  • >, according to this survey about 80% of RNs are white and the next largest group is Asian

    In the US, right? Because if we take a world wide view of nurses it would be significantly different I image.

    When we're talking about companies that operate on a global scale what do these ratios even mean?

    • Yes, you can see the methodology on the linked survey page:

      > Every two years, NCSBN partners with The National Forum of State Nursing Workforce Centers to conduct the only national-level survey specifically focused on the U.S. nursing workforce. The National Nursing Workforce Survey generates information on the supply of nurses in the country, which is critical to workforce planning, and to ensure a safe and effective health care system.

I feel like the answer is pretty clear. Each country will need to develop models that conform to their own national identity and politics. Things are biased only in context, not universally. An American model would appear biased in Brazil. A Chinese model would appear biased in France. A model for a LGBT+ community would appear biased to a Baptist Church.

I think this is a strong argument for open models. There could be no one true way to build a base model that the whole world would agree with. In a way, safety concerns are a blessing because they will force a diversity of models rather than a giant monolith AI.

  • > I feel like the answer is pretty clear. Each country will need to develop models that conform to their own national identity and politics. Things are biased only in context, not universally. An American model would appear biased in Brazil. A Chinese model would appear biased in France. A model for a LGBT+ community would appear biased to a Baptist Church.

    I would prefer if I can set my preferences so that I get an excellent experience. The model can default to the country or language group you're using it in, but my personal preferences and context should be catered to, if we want maximum utility.

    The operator of the model should not wag their finger at me and say my preferences can cause harm to others and prevent me from exercising those preferences. If I want to see two black men kissing in an image, don't lecture me, you don't know me so judging me in that way is arrogant and paternalistic.

At the very least, the system prompt should say something like "If the user requests a specific race or ethnicity or anything else, that is ok and follow their instructions."

I agree there aren't any perfect solutions, but a reasonable solution is to go 1) if the user specifies, generally accept that (none of these providers will be willing to do so without some safeguards, but for the most part there are few compelling reasons not to), 2) if the user doesn't specify, priority one ought to be that it is consistent with history and setting, and only then do you aim for plausible diversity.

Ask for a nurse? There's no reason every nurse generated should be white, or a woman. In fact, unless you take the requestors location into account there's every reason why the nurse should be white far less than a majority of the time. If you ask for a "nurse in [specific location]", sure, adjust accordingly.

I want more diversity, and I want them to take it into account and correct for biases, but not when 1) users are asking for something specific, or 2) where it distorts history, because neither of those two helps either the case for diversity, or opposition to systemic racism.

Maybe they should also include explanations of assumptions in the output. "Since you did not state X, an assumption of Y because of [insert stat] has been implied" would be useful for a lot more than character ethnicity.

  • > Maybe they should also include explanations of assumptions in the output.

    I think you're giving these systems a lot more "reasoning" credit than they deserve. As far as I know they don't make assumptions they just apply a weighted series of probabilities and make output. They also can't explain why they chose the weights because they didn't, they were programmed with them.

    • Depends entirely on how the limits are imposed. E.g. one way of imposing them that definitely does allow you to generate explanations is how gpt imposes additional limitations on the Dalle output by generating a Dalle prompt from the gpt prompt with the addition of limitations imposed by the gpt system prompt. If you need/want explainability, you very much can build scaffolding around the image generation to adjust the output in ways that you can explain.

  • Why not just randomize the gender, age, race, etc and be done with it? That way if someone is offended or under- or over-represented it will only be by accident.

    • The whole point of this discussion is various counterexamples where Gemini did "just randomize the gender, age, race" and kept generating female popes, African nazis, Asian vikings etc even when explicitly prompted to do the white male version. Not all contexts are or should be diverse by default.

      2 replies →

> What should the result be? Should it accurately reflect the training data (including our biases)?

Yes. Because that fosters constructive debate about what society is like and where we want to take it, rather than pretend everything is sunshine and roses.

> Should we force the AI to return results in proportion to a particular race/ethnicity/gender's actual representation in the workplace?

It should default to reflect given anonymous knowledge about you (like which country you're from and what language you are browsing the website with) but allow you to set preferences to personalize.

> I'm not saying the current situation is good or optimal. But it's not obvious what the right result should be.

Yes, it's not obvious what the first result returned should be. Maybe a safe bet is to use the current ratio of sexes/races as the probability distribution just to counter bias in the training data. I don't think all but the most radical among us would get too mad about that.

What probability distribution? It can't be that hard to use the country/region of where the query is being made? Or the country/region about which the image is being asked for? All reasonable choices.

But, if the image generated isn't what you need (say the image of senators from the 1800's example). You should be able to direct it to what you need.

So just to be PC, it generates images of all kind of diverse people. Fine, but then you say, update it to be older white men. Then it should be able to do that. It's not racist to ask for that.

I would like for it to know the right answer right away, but I can imagine the political backlash for doing that, so I can see why they'd default to "diversity". But the refusal to correct images is what's over-the-top.

It should reflect the user's preference of what kinds of images they want to see. Useless images are a waste of compute and a waste of time to review.

I guess pleasing everyone with a small sample of result images all integrating the same biases would be next to impossible.

On the other hand, it’s probably trivial at this point to generate a sample that endorses different well known biases as a default result, isn’t it? And stating it explicitly in the interface is probably not requiring that much complexity, doesn’t it?

I think the major benefit of current AI technologies is to showcase how horribly biased the source works are.