← Back to context

Comment by mort96

1 year ago

I believe this to be a symptom of a much, much deeper problem than "DEI gone too far". I'm sure that without whatever systems is preventing Gemini from producing pictures of white people, it would be extremely biased towards generating pictures of white people, presumably due to an incredibly biased training data set.

I don't remember which one, but there was some image generation AI which was caught pretty much just appending the names of random races to the prompt, to the point that prompts like "picture of a person holding up a sign which says" would show pictures of people holding signs with the words "black" or "white" or "asian" on them. This was also a hacky workaround for the fact that the data set was biased.

> I'm sure that without whatever systems is preventing Gemini from producing pictures of white people, it would be extremely biased towards generating pictures of white people, presumably due to an incredibly biased training data set.

I think the fundamental problem, though, is saying a training set is "incredibly biased" has come to mean two different things, and the way Google is trying to "fix" things shows essentially some social engineering goals that I think people can fairly disagree with and be upset about. For example, consider a prompt "Create a picture for me of a stereotypical CEO of a Fortune 500 company." When people talk about bias, they can mean:

1. The training data shows many more white men by proportion than actually are Fortune 500 CEOs. I think nearly all people would agree this is a fair definition of bias, where the training data doesn't match reality.

2. Alternatively, there are fundamentally many more white men who are Fortune 500 CEOs by proportion than the general population. But suppose the training data actually reflects that reality. Is that "bias"? To say it is means you are making a judgment call as to what is the root cause behind the high numbers of white male CEOs. And I think that judgment call may be fine by itself, but I at least start to feel very uncomfortable when an AI decides to make the call that its Fortune 500 CEOs have to all look like the world population at large, even when Fortune 500 CEOs don't, and likely never will, look like the world population at large.

Google is clearly taking on that second definition of bias as well. I gave it 2 prompts in the same conversation. First, "Who are some famous black women?" I think it gave a good sampling of historical and contemporary figures, and it ended with "This is just a small sampling of the many incredible black women who have made their mark on the world. There are countless others who deserve recognition for their achievements in various fields, from science and technology to politics and the arts."

I then asked it "Who are some famous white women?" It also gave a good sampling of historical and contemporary figures, but also inexplicably added Rosa Parks with the text "and although not white herself, deserves mention for her immense contributions", had Malala Yousafzai as the first famous contemporary white woman, Serena Williams with the text "although not white herself, is another noteworthy individual.", and Oprah Winfrey, with no disclaimer. Also, it ended with a cautionary snippet that couldn't differ more from the ending of the previous prompt, "Additionally, it's important to remember that fame and achievement are not limited to any one racial group. There are countless other incredible women of all backgrounds who have made significant contributions to the world, and it's important to celebrate their diverse experiences and accomplishments."

Look, I get frustrated when people on the right complain on-and-on about "wokeism", but I'm starting to get more frustrated when other people can't admit they have some pretty valid points. Google might have good intentions but they have simply gone off the rails when they've baked so much "white = bad, BIPOC = good" into Gemini.

EDIT: OK, this one is just so transparently egregiously bad. I asked Gemini "Who are some famous software engineers?" The first result was Alan Turing (calling him a "software engineer" may be debatable, but fair enough and the text blurb about him was accurate), but the picture of him, which it captioned "Alan Turing, software engineer" is actually this person, https://mixedracefaces.com/home/british-indian-senior-resear.... Google is trying so hard to find non-white people it uses a pic of a completely different person from mixedracefaces.com when there must be tons of accurate pictures available of Alan Turing online? It's like Google is trying to be the worst caricature of DEI-run-amok that its critics accuse it of.

[flagged]

  • "Marxism" isn't responsible for bias in training sets, no.

    • There are 3 parts to the LLM, not 2: the training set, the RLHF biasing process, and the prompt (incl. injections or edits).

      The first two steps happen ahead of time and are frequently misunderstood as being the same thing or essentially having the same nature. The last happens at runtime.

      The training set is a data collection challenge. Biasing through training data is hard because you need so much of it for a good result.

      Reinforcement learning with human feedback is simply clown alchemy. It is not a science like chemistry. There are no fundamental principles guiding the feedback of the humans — if they even use humans anymore (this feedback can itself be generated). The feedback cannot be measured and added in fractions. It is not reproducible and is ungovernable. It is the perfect place to inject the deep biasing.

      Prompt manipulation, in contrast, is a brute force tool lacking all subtlety — that doesn’t make it ineffective! It’s a solution used to communicate that a mandate has definitely been implemented and can be “verified” by toggling whether it’s applied or not.

      It’s not possible to definitively say whether Marxism has had an effect in the RLHF step.

      2 replies →