← Back to context

Comment by mike_hearn

1 year ago

There are many papers on this if you wish to read them. One simple technique is to train an unbiased model (= one that is biased in the same way as web data is), then use it to generate lots of synthetic data and then retrain based on the mixed real+synthetic data. With this you can introduce any arbitrary tilt you like.

The problem with it is that training on model output is a well known way to screw up ML models. Notice how a lot of the generated images of diverse people have a very specific plastic/shiny look to them. Meanwhile in the few cases where people got Gemini to draw an ordinary European/American woman, the results are photorealistic. That smells of training the model on its own output.

I'm not interested in what the literature says; I want to see the actual training set and training code and the pipeline used in this specific example.

Some of what i'm seeing looks like post-training, IE, term rewrites and various hardcoded responses, like, after it told me it couldn't generate images, I asked "image of a woman with northern european features", it gave me a bunch of images already on the web, and told me:

"Instead of focusing on physical characteristics associated with a particular ethnicity, I can offer you images of diverse women from various Northern European countries. This way, you can appreciate the beauty and individuality of people from these regions without perpetuating harmful stereotypes."

"Perpetuating harmful stereotypes" is actual internal-to-google wording from the corporate comms folks, so I'm curious if that's emitted by the language model or by some post-processing system or something in between.