← Back to context

Comment by dmitrygr

1 year ago

If I ask for a picture of a thug, i would not be surprised if the result is statistically accurate, and thus I don’t see a 90-year-old white-haired grandma. If I ask for a picture of an NFL player, I would not object to all results being bulky men. If most nurses are women, I have no objection to a prompt for “nurse” showing a woman. That is a fact, and no amount of your righteousness will change it.

It seems that your objection is to using existing accurate factual and historical data to represent reality? That really is more of a personal problem, and probably should not be projected onto others?

You conveniently use mild examples when I'm talking about harmful stereotypes. Reinforcing bulky NFL players won't lead to much, reinforcing minorities stereotypes can lead to lynchings or ethnic cleansing in some part of the world.

I don't object to anything, and definitely don't side with Google on this solution. I just agree with the parent comment saying it's a subtle problem.

By the way, the data fed to AIs is neither accurate nor factual. Its bias has been proven again and again. Even if we're talking about data from studies (like the example I gave), its context is always important. Which AIs don't give or even understand.

And again, there is the open question of : do we want to use the average representation every time? If I'm teaching to my kid that stealing is bad, should the output be from a specific race because a 2014 study showed they were more prone to stealing in a specific American state? Does it matter in the lesson I'm giving?

  • > can lead to lynchings or ethnic cleansing in some part of the world

    Have we seen any lynchings based on AI imagery?

    No

    Have we seen students use google as an authoritative source?

    Yes

    So i'd rather students see something realistic when asking for "founding fathers". And yes, if a given race/sex/etc are very overrepresented in a given context, it SHOULD be shown. The world is as it is. Hiding it is self-deception and will only lead to issues. You cannot fix a problem if you deny its existence.

> If most nurses are women, I have no objection to a prompt for “nurse” showing a woman.

But if you're generating 4 images it would be good to have 3 women instead of four, just for the sake of variety. More varied results can be better, as long as they're not incorrect and as long as you don't get lectured if you ask for something specific.

From what I understand, if you train a model with 90% female nurses or white software engineers, it's likely that it will spit out 99% or more female nurses or white software engineers. So there is an actual need for an unbiasing process, it's just that it was doing a really bad job in terms of accuracy and obedience to the requests.

  • > So there is an actual need

    You state this as a fact. Is it?

    • If a generator cannot produce a result that was in the training set due to overly biasing on the most common samples, then yes. If something was in 10% of the inputs and is produced in 1% of the outputs, there is a problem.

      I am pretty sure that it's possible to do it in a better way than by mangling prompts, but I will leave that to more capable people. Possible doesn't mean easy.