Comment by jimbobthemighty

6 days ago

I asked Gemini for a video of 'pelican riding a unicycle in hyde park' - I was blown away by the output:

https://gemini.google.com/share/55e250c99693

47 comments

jimbobthemighty

According to OP:

> Why this test? Because pelicans are hard to draw, bicycles are hard to draw, pelicans can’t ride bicycles... and there’s zero chance any AI lab would train a model for such a ridiculous task.

At this juncture I'm left wondering why competing AI labs wouldn't train for this now well known "test".

nijave 5 days ago

Given their proclivity to scrape the entire contents of the internet, it's only a matter of time intentional or otherwise.
I've heard the same has happened with common benchmarks (they've ingested solutions into training data)

sfdlkj3jk342a 6 days ago

I'm surprised by Grok as well:

https://grok.com/imagine/post/8d1eab88-737f-4d46-ba92-9b6502...

Interesting that it does better at making the pelican peddle in the video generation than in image generation.

IdiotSavage 5 days ago

Graphically perfect, but content-wise nonsense. The pelican's center of gravity is clearly behind the wheel. It needs to be above or very slightly ahead of the wheel.

horsawlarway 5 days ago
I don't think it's graphically perfect either.
The length of the pedals keeps changing, and you'll notice that neither of the pedals actually rotates around the hub: consistent with your point about the center of gravity being too far back, the circle the pedals are making is also shifted back too far.
- navane 5 days ago
  
  Oh those pedals go all over the place indeed
ciberado 5 days ago

Still impressed. And, to be honest, I don't think that this problem matter much. Physical accuracy is very nice, but for example is not the most important aspect when I watch a fantasy movie. Or even a scifi one.
djeastm 5 days ago

Maybe the pelican has something heavy in its mouth.
mycall 5 days ago

I do hope that JEPA can help resolve the nonsense from AI models.

nijave 5 days ago

Google/Gemini has pretty impressive audio visual capabilities. I tried to have Claude add mulch to a landscape picture and it looked like someone hit it with the orange spray paint tool in MS Paint. Nano Banana actually produced something fairly realistic

grey-area 6 days ago

That’s really impressive, and slightly worrying for creatives involved in film, animation or modelling.

notachatbot123 6 days ago
Even more worrying are the implications for fakenews, propaganda, fraud, deception and mental health.
- sevenzero 6 days ago
  
  This is really my biggest worry when it gets to consumer AI. People already have a hard time informing themselves properly. Now we have technology that just boosts the already existing confirmation bias people have. It's sickening.
- dzhiurgis 6 days ago
  
  Maybe short term yes. But longer term people will finally put their guard up against deception that’s been around for decades.
  
  10 replies →
drdaeman 6 days ago
It’s the opposite, non-creatives (if such roles even exist in those industries) should be worried. All those models offset technical skills, allowing to get from idea to implementation through a different route (which can be easier or harder depending on idea and model - good luck tweaking that pelican’s exact pose and movements to match your imagination precisely). Nothing touches creativity, not even in the slightest.
But there’s a lot of panicking, fear-mongering and all sorts of nonsense around this whole subject.
- Retric 6 days ago
  
  My mother has started watching 100% AI generated stories on YouTube. They are good enough to be entertaining even if they include random errors like messing up the main character’s name.
  The thing is the creative economy is all about people’s attention and pocketbooks, it doesn’t need to be great just good enough.
  
  9 replies →
- colinb 6 days ago
  
  The truly excellent weavers will be fine?
- grey-area 6 days ago
  
  That’s really not how this is going to play out.
  When advertising agencies for example see that their copywriter can go from idea to concept with a video generator instead of engaging an animator, they’ll simply cut the middleman who used to create that animation for them and use the tool instead, even if the content isn’t as good (though the quality of this one is really pretty good, there are obvious problems). They’ll happily accept mediocrity to save money.
  People will still create adverts but quality and creativity will go down and a lot of jobs are going to be suddenly displaced.
- flakeoil 6 days ago
  
  Does "creative" mean that you are creative at coming up with ideas or does it mean that you are artistic and can create stuff?
  I suppose it is more the latter, and it's the artistic people who create stuff who will suffer. The ones coming up with ideas, but previously couldn't create becasuse they lacked skill might win thanks to AI.
  Coming up with ideas is easy, creating and putting in the effort is hard (until we had AI).
  Probably the value of created stuff will go down rapidly because there will be so much of it.
AussieWog93 6 days ago
I wouldn't be that concerned that animation is going anywhere. Both outputs look really off, especially around the feet.
- wongarsu 6 days ago
  
  In a serious creative tool you would also want a lot more creative input. At a minimum the ability to steer the animation with skeletons that feed into a control net, or something like that. And the ability to control the look and feel and create much more consistent characters. Both things that exist in good tooling, but both things that create work that will keep animators employed. But it will dramatically reduce the number of animators needed to reach a given level of "good enough".
  And looking at the trajectory of the animation industry, I don't think increases in productivity will be used to raise the quality of the animation if the alternative is to just pay fewer animators
- grey-area 6 days ago
  
  Yes sure if you look closely it’s slop, but a huge number of companies and advertisers just don’t care (and they feel the same about their social media content, blogs and yes code) - they will attempt to cut corners where they can to the detriment of true artists.
  But yes, for anyone who does this for a living there will be obvious deficiencies, esp when you try to do something truly novel, intentional and interesting and don’t quite want what it produces.
  But in this area they have made quite a lot of progress.
hackable_sand 5 days ago

It's really not

ionwake 6 days ago

only SVG counts tho, dont know why

falcor84 6 days ago
Willison chose this task because (unlike actual images of pelicans) is was clearly not in training data, but could be reasoned about and composed from what's there. But just like those "how many golf balls can you fit in a 747?" interview questions, it should now be retired.
- ionwake 5 days ago
  
  Thank you for the reply. Would something like a Squirrel flying a hangglider as an SVG be a good new test? Or would that be indirectly in the training data too?
simonw 5 days ago

It's a test of text-based LLMs to see how good they are at SVG geometry. Video models are a different category of software entirely.

songting591 6 days ago

[flagged]