Comment by sempron64

6 days ago

The pelican has looked very same-y across all frontier models, same color bike, same camera angle, etc. I suspect this challenge is already too embedded in the training data to be a good signal when it succeeds, and maybe even when it fails in pathological ways mirroring existing AI pelicans on the internet.

26 comments

sempron64

h4ny 5 days ago

Was it ever a good test? How do you even objectively assess what a good pelican on a bike is anyway?

fwipsy 5 days ago

SVG generation is a good test because it's extremely easy to subjectively assess with visual reasoning where humans are strong. However, pelican on a bike specifically may be overused at this point.

Fuzzwah 5 days ago

The "big beak!" comment in the svg source makes me think it's definitely a gamed "benchmark" at this point.

kayge 5 days ago

Do you think the models are ready for the next level? I believe that would be: Pelican feeding Spaghetti to Will Smith.

quantumwoke 5 days ago

Variations of this comment have been posted for over a year. The pelican has now morphed into part of HN culture rather than a legitimate benchmark, but it's still valuable as a meme.

brazukadev 5 days ago

it is more an example of gaming (the HN system) than meme.

stratos123 5 days ago

I'd be very surprised if this is in the training data given that most models mess it up to this day. E.g. look at the ones from Opus.

tripleee 5 days ago

[flagged]

yreg 5 days ago
I really don't understand what's interesting about this test and why is it always on top.
- simonw 5 days ago
  
  It's funny.
  
  2 replies →
- depr 5 days ago
  
  Same reason you would always see the same top comments on reddit during a certain era.
  
  6 replies →
- inglor_cz 5 days ago
  
  It has become a funny meme, much like "My hovercraft is full of eels!"
- luqtas 5 days ago
  
  because you can't still ask LLMs to port DOOM to hardware X or Y
- WithinReason 5 days ago
  
  It's a meme, and HN loves upvoting memes. Just like Reddit!
port11 5 days ago

The ultimate measure of an LLM is whether it can produce a capable image of a pelican riding a bicycle. All other use cases are but a distraction!
scrollaway 5 days ago
Do you seriously have a dedicated “bad takes on AI” hn account?
- tripleee 5 days ago
  
  yeah, although I do combine it with "replies to snarky questions" for efficiency
jurgenaut23 5 days ago

True that