← Back to context

Comment by jimbobthemighty

6 days ago

I asked Gemini for a video of 'pelican riding a unicycle in hyde park' - I was blown away by the output:

https://gemini.google.com/share/55e250c99693

According to OP:

> Why this test? Because pelicans are hard to draw, bicycles are hard to draw, pelicans can’t ride bicycles... and there’s zero chance any AI lab would train a model for such a ridiculous task.

At this juncture I'm left wondering why competing AI labs wouldn't train for this now well known "test".

  • Given their proclivity to scrape the entire contents of the internet, it's only a matter of time intentional or otherwise.

    I've heard the same has happened with common benchmarks (they've ingested solutions into training data)

Graphically perfect, but content-wise nonsense. The pelican's center of gravity is clearly behind the wheel. It needs to be above or very slightly ahead of the wheel.

  • I don't think it's graphically perfect either.

    The length of the pedals keeps changing, and you'll notice that neither of the pedals actually rotates around the hub: consistent with your point about the center of gravity being too far back, the circle the pedals are making is also shifted back too far.

  • Still impressed. And, to be honest, I don't think that this problem matter much. Physical accuracy is very nice, but for example is not the most important aspect when I watch a fantasy movie. Or even a scifi one.

Google/Gemini has pretty impressive audio visual capabilities. I tried to have Claude add mulch to a landscape picture and it looked like someone hit it with the orange spray paint tool in MS Paint. Nano Banana actually produced something fairly realistic

That’s really impressive, and slightly worrying for creatives involved in film, animation or modelling.

  • Even more worrying are the implications for fakenews, propaganda, fraud, deception and mental health.

    • This is really my biggest worry when it gets to consumer AI. People already have a hard time informing themselves properly. Now we have technology that just boosts the already existing confirmation bias people have. It's sickening.

  • It’s the opposite, non-creatives (if such roles even exist in those industries) should be worried. All those models offset technical skills, allowing to get from idea to implementation through a different route (which can be easier or harder depending on idea and model - good luck tweaking that pelican’s exact pose and movements to match your imagination precisely). Nothing touches creativity, not even in the slightest.

    But there’s a lot of panicking, fear-mongering and all sorts of nonsense around this whole subject.

    • My mother has started watching 100% AI generated stories on YouTube. They are good enough to be entertaining even if they include random errors like messing up the main character’s name.

      The thing is the creative economy is all about people’s attention and pocketbooks, it doesn’t need to be great just good enough.

      9 replies →

    • That’s really not how this is going to play out.

      When advertising agencies for example see that their copywriter can go from idea to concept with a video generator instead of engaging an animator, they’ll simply cut the middleman who used to create that animation for them and use the tool instead, even if the content isn’t as good (though the quality of this one is really pretty good, there are obvious problems). They’ll happily accept mediocrity to save money.

      People will still create adverts but quality and creativity will go down and a lot of jobs are going to be suddenly displaced.

    • Does "creative" mean that you are creative at coming up with ideas or does it mean that you are artistic and can create stuff?

      I suppose it is more the latter, and it's the artistic people who create stuff who will suffer. The ones coming up with ideas, but previously couldn't create becasuse they lacked skill might win thanks to AI.

      Coming up with ideas is easy, creating and putting in the effort is hard (until we had AI).

      Probably the value of created stuff will go down rapidly because there will be so much of it.

  • I wouldn't be that concerned that animation is going anywhere. Both outputs look really off, especially around the feet.

    • In a serious creative tool you would also want a lot more creative input. At a minimum the ability to steer the animation with skeletons that feed into a control net, or something like that. And the ability to control the look and feel and create much more consistent characters. Both things that exist in good tooling, but both things that create work that will keep animators employed. But it will dramatically reduce the number of animators needed to reach a given level of "good enough".

      And looking at the trajectory of the animation industry, I don't think increases in productivity will be used to raise the quality of the animation if the alternative is to just pay fewer animators

    • Yes sure if you look closely it’s slop, but a huge number of companies and advertisers just don’t care (and they feel the same about their social media content, blogs and yes code) - they will attempt to cut corners where they can to the detriment of true artists.

      But yes, for anyone who does this for a living there will be obvious deficiencies, esp when you try to do something truly novel, intentional and interesting and don’t quite want what it produces.

      But in this area they have made quite a lot of progress.

only SVG counts tho, dont know why

  • Willison chose this task because (unlike actual images of pelicans) is was clearly not in training data, but could be reasoned about and composed from what's there. But just like those "how many golf balls can you fit in a 747?" interview questions, it should now be retired.

    • Thank you for the reply. Would something like a Squirrel flying a hangglider as an SVG be a good new test? Or would that be indirectly in the training data too?

  • It's a test of text-based LLMs to see how good they are at SVG geometry. Video models are a different category of software entirely.