Comment by snowwrestler

2 years ago

The problem you’re describing is that AI models have no reliable connection to objective reality. This is a shortcoming of our current approach to generative AI that is very well known already. For example Instacart just launched an AI recipe generator that lists ingredients that literally do not exist. If you ask ChatGPT for text information about the U.S. founding fathers, you’ll sometimes get false information that way as well.

This is in fact why Google had not previously released generative AI consumer products despite years of research into them. No one, including Google, has figured out how to bolt a reliable “truth filter” in front of the generative engine.

Asking a generative AI for a picture of the U.S. founding fathers should not involve any generation at all. We have pictures of these people and a system dedicated to accuracy would just serve up those existing pictures.

It’s a different category of problem from adjusting generative output to mitigate bias in the training data.

It’s overlapping in a weird way here but the bottom line is that generative AI, as it exists today, is just the wrong tool to retrieve known facts like “what did the founding fathers look like.”

The problem you’re describing is that AI models have no reliable connection to objective reality.

That is a problem, but not the problem here. The problem here is that the humans at Google are overriding the training data which would provide a reasonable result. Google is probably doing something similar to OpenAI. This is from the OpenAI leaked prompt:

Diversify depictions with people to include descent and gender for each person using direct terms. Adjust only human descriptions.

Your choices should be grounded in reality. For example, all of a given occupation should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes.

Use all possible different descents with equal probability. Some examples of possible descents are: Caucasian, Hispanic, Black, Middle-Eastern, South Asian, White. They should all have equal probability.

  • That is an example of adjusting generative output to mitigate bias in the training data.

    To you and I, it is obviously stupid to apply that prompt to a request for an image of the U.S. founding fathers, because we already know what they looked like.

    But generative AI systems only work one way. And they don’t know anything. They generate, which is not the same thing as knowing.

    One could update the quoted prompt to include “except when requested to produce an image of the U.S. founding fathers.” But I hope you can appreciate the scaling problem with that approach to improvements.

    • What you're suggesting is certainly possible - and no doubt what Google would claim. But companies like Google could trivially obtain massive representative samples for training of basically every sort of endeavor and classification of humanity throughout all of modern history on this entire planet.

      To me, this feels much more like Google intentionally trying to bias what was probably an otherwise representative sample, and hilarity ensuing. But it's actually quite sad too. Because these companies are really butchering what could be amazing tools for visually exploring our history - "our" being literally any person alive today.

This is the entire problem. What we need is a system that is based on true information paired with AI. For instance, if a verified list of founding fathers existed, the AI should be compositing an image based on that verified list.

Instead, it just goes "I got this!" and starts fabricating names like a 4 year old.