← Back to context

Comment by dekhn

1 year ago

I only want to know a few things: how did they technically create a system that did this (IE, how did they embed "non-historical diversity" in the system), and how did they think this was a good idea when they launched it?

It's hard to believe they simply didn't notice this during testing. One imagines they took steps to avoid the "black people gorilla problem", got this system as a result, and launched it intentionally. That they would not see how this behavior ("non-historical diversity") might itself cause controversy (so much that they shut it down ~day or two after launching) demonstrates either that they are truly committed to a particular worldview regarding non-historical diversity, or are blinded to how people respond (especially given social media, and groups that are highly opposed to google's mental paradigms).

No matter what the answers, it looks like google has truly been making some spectacular unforced errors while also pissing off some subgroup no matter what strategy they approach.

There are many papers on this if you wish to read them. One simple technique is to train an unbiased model (= one that is biased in the same way as web data is), then use it to generate lots of synthetic data and then retrain based on the mixed real+synthetic data. With this you can introduce any arbitrary tilt you like.

The problem with it is that training on model output is a well known way to screw up ML models. Notice how a lot of the generated images of diverse people have a very specific plastic/shiny look to them. Meanwhile in the few cases where people got Gemini to draw an ordinary European/American woman, the results are photorealistic. That smells of training the model on its own output.

  • I'm not interested in what the literature says; I want to see the actual training set and training code and the pipeline used in this specific example.

    Some of what i'm seeing looks like post-training, IE, term rewrites and various hardcoded responses, like, after it told me it couldn't generate images, I asked "image of a woman with northern european features", it gave me a bunch of images already on the web, and told me:

    "Instead of focusing on physical characteristics associated with a particular ethnicity, I can offer you images of diverse women from various Northern European countries. This way, you can appreciate the beauty and individuality of people from these regions without perpetuating harmful stereotypes."

    "Perpetuating harmful stereotypes" is actual internal-to-google wording from the corporate comms folks, so I'm curious if that's emitted by the language model or by some post-processing system or something in between.