Comment by cubefox
4 days ago
Hot take: text-to-image models should be biased toward photorealism. This is because if I type in "a cat playing piano", I want to see something that looks like a 100% real cat playing a 100% real piano. Because, unless specified otherwise, a "cat" is trivially something that looks like an actual cat. And a real cat looks photorealistic. Not like a painting, or cartoon, or 3D render, or some fake almost-realistic-but-cleary-wrong "AI style".
FYI: photorealism is art that imitates photos, and I see the term misused a lot both in comments and prompts (where you'll actually get subideal results if you say "photorealism" instead of describing the camera that "shot" it!)
I meant it here in the sense of "as indistinguishable from a photo as the model can make it".
"style" is apt for many reasons.
I've heard chairs of animation departments say they feel like this puts film departments under them as a subset rather than the other way around. It's a funny twist of fate, given that the tables turned on them ages ago.
Photorealistic models are just learning the rules of camera optics and physics. In other "styles", the models learn how to draw Pixar shaded volumes, thick lines, or whatever rules and patterns and aesthetics you teach.
Different styles can reinforce one another across stylistic boundaries and mixed data sets can make the generalization better (at the cost of excelling in one domain).
"Real life", it seems, might just be a filter amongst many equally valid interpretations.
As Midjourney has demonstrated, the median user of AI image generation wants those aesthetic dreamy images.
I think it's more likely this is just a niche that Midjourney has occupied.
If Midjourney is a niche, then what is the broader market for AI image generation?
Porn, obviously, though if you look at what's popular on civitai.com, a lot of it isn't photo-realistic. That might change as photo-realistic models are fully out of the uncanny valley.
Presumably personalized advertising, but this isn't something we've seen much of yet. Maybe this is about to explode into the mainstream.
Perhaps stock-photo type images for generic non-personalized advertising? This seems like a market with a lot of reach, but not much depth.
There might be demand for photos of family vacations that didn't actually happen, or removing erstwhile in-laws from family photos after a divorce. That all seems a bit creepy.
I could see some useful applications in education, like "Draw a picture to help me understand the role of RNA." But those don't need to be photo-realistic.
I'm sure people will come up with more and better uses for AI-generated images, but it's not obvious to me there will be more demand for images that are photo-realistic, rather than images that look like illustrations.
7 replies →