← Back to context

Comment by ethbr1

1 year ago

> It seems perfectly reasonable to say that generated imagery should attempt to not lean into stereotypes and show a diverse set of people.

When stereotypes clash with historical facts, facts should win.

Hallucinating diversity where there was none simply sweeps historical failures under the rug.

If it wants to take a situation where diversity is possible and highlight that diversity, fine. But that seems a tall order for LLMs these days, as it's getting into historical comprehension.

>Hallucinating diversity where there was none simply sweeps historical failures under the rug.

Failures and successes. You can't get this thing to generate any white people at all, no matter how explicitly or implicitly you ask.

I think the root problem is assuming that these generated images are representations of anything.

Nobody should.

They’re literally semi-random graphic artifacts that we humans give 100% of the meaning to.

  • So you're saying whatever the model doesn't have to be tethered to reality at all? I wonder if you think the same for chatgpt. Do you think it should just make up whatever it wants when asked a question like "why does it rain?". After all, you can say the words generated are also semi-random sequence of letters that humans give meaning too.

    • I think going to a statistics based generator with the intention to take what you see as an accurate representation of reality is a non starter.

      The model isn’t trying to replicate reality, it’s trying to minimize some error metric.

      Sure it may be inspired by reality, but should never be considered an authority on reality.

      And yes, the words an LLM write have no meaning. We assign meaning to the output. There was no intention behind them.

      The fact that some models can perfectly recall _some_ information that appears frequently in the training data is a happy accident. Remember, transformers were initially designed for translation tasks.

    • > Do you think it should just make up whatever it wants when asked a question like "why does it rain?"

      Always doing that would be preferable to the status quo, where it does it just often enough to do damage while retaining a veneer of credibility.

  • > They’re literally semi-random graphic artifacts that we humans give 100% of the meaning to.

    They're graphic artifacts generated semi-randomly from a training set of human-created material.

    That's not quite the same thing, as otherwise the "adjustment" here wouldn't have been considered by Google in the first place.

    • The fact that the training data is human curated arguably further removes the generations from representing reality (as we see here with this whole little controversy)

      I think, with respect to the point I was making, they are the same thing.

  • But then if it simply reflected reality there also be no problem, right, because it’s a synthetically generated output. Like if instead of people it output animals, or it took representative data from actual sources to the question. In either case it should be “ok” because it’s generated? They might as well output planet of the apes or starship trooper bugs…

  • With emphasis on the "semi-". They are very good at following prompts, and so overplaying the "random" part is dishonest. When you ask it for something, and it follows your instructions except for injecting a bunch of biases for the things you haven't specified, it matters what those biases are.

    • Are they good at following prompts?

      Unless I format my prompts very specifically, diffusion models are not good at following them. Even then I need to constantly tweak my prompts and negative prompts to zero in on what I want.

      That process is novel and pretty fun, but it doesn’t imply the model is good at following my prompt.

      LLMs are similar. Initially they seem good at following a prompt, but continue the conversation and they start showing recall issues, knowledge gaps, improper formatting, etc.

      It’s not dishonest to say semi-random. It’s accurate. The detokenizing step of inference, for example, is taking a sample from a probability distribution which the model generates. Literally stochastic.

Why should facts win? It's art, and there are no rules in art. I could draw black george washington too.

[edit]

Statistical inference machines following human language prompts that include "please" and "thank you" have absolutely 0 ideas of what a fact is.

"A stick bug doesn't know what it's like to be a stick."

  • If there are no rules in art, then white George Washington should be acceptable.

    But I would counter that there are certainly rules in art.

    Both historical (expectations and real history) and factual (humans have a number of arms less than or equal to 2).

    If you ask Gemini to give you an image of a person and it returns a Pollock drip work... most people aren't going to be pleased.

  • Art doesn't have to be tethered to reality, but I think it's reasonable to assume that a generic image generation ai should generate images according to reality. There's no rules in art, but people would be pretty baffled if every image that was generated by gemeni was in dr seuss's art style by default. If they called it "dr seuss ai" I don't think anyone would care. Likewise, if they explicitly labeled gemini as "diverse image generation" or whatever most of the backlash would evaporate.

  • If you try to draw white George Washington but the markers you use keep spitting out different colors from the ones you picked, you’d throw out the entire set and stop buying that brand of art supplies in the future.

  • Because white people exist and it refuses to draw them when asked explicitly. It doesn’t refuse for any other race.