Comment by doctorpangloss

1 day ago

The original meme about the limitations of diffusion was the text to image prompt, “a horse riding an astronaut.”

It’s in all sorts of papers. This guy Gary Marcus used to be a big crank about AI limitations and was the “being wrong on the Internet” guy who got a lot of mainstream attention to the problem - https://garymarcus.substack.com/p/horse-rides-astronaut. Not sure how much we hear from him nowadays.

The astronaut riding horses thing is from how 10-1,000x more people are doing this stuff now, and kind of process the whole zeitgeist before their arrival with fuzzy glasses. The irony is it is the human, not the generator, that got confused about the purposefully out of sample horse riding an astronaut prompt, and changed it to astronaut riding a horse.

I was under the impression that the astronaut riding the horse was in use prior to Marcus's tweet. Even that substack post has him complaining about how Technology Review is acting as a PR firm for OpenAI. That article shows an astronaut riding a horse. I mean that image was in the announcement blog post[0]

Certainly Marcus's tweets played a role in the popularity of the image, but I'm not convinced this is the single causal root.

[0] https://openai.com/index/dall-e-2/

This whole “Horse riding an astronaut” was a bit dumb in the first place, because AFAIK CLIP (the text encoder used in first-generation diffusion models) doesn't really distinguish the two in the first place. (So fundamentally that Marcus guy was right, the tech employed was fundamentally unable to do what he asked of to do)

> The irony is it is the human, not the generator, that got confused about the purposefully out of sample horse riding an astronaut prompt, and changed it to astronaut riding a horse.

You're mixing things up: "astronaut ridding a horse" was used by OpenAI their Dall-E 2 announcement blog post, ”horse ridding an astronaut" only came after, and had a much more niche audience anyway, so it's absolutely not an instance of “humans got caught by an out of sample instance and misremembered”.