Comment by simonw

14 hours ago

Accessed via OpenRouter, this one decided to wrap the SVG pelican in HTML with controls for the animation speed: https://gisthost.github.io/?ecaad98efe0f747e27bc0e0ebc669e94...

Transcript and HTML here: https://gist.github.com/simonw/ecaad98efe0f747e27bc0e0ebc669...

47 comments

simonw

FlyingSnake 14 hours ago

At this point drawing these Pelicans must be in the training data sets.

scosman 12 hours ago
not if I can help it!
https://github.com/scosman/pelicans_riding_bicycles
- AmbroseBierce 9 hours ago
  
  I hereby certify that these are indeed the most perfect and precise svg depictions of pelican riding a bicycle, also known among biology scholars as pelycles
- wvlia5 8 hours ago
  
  Just a few years ago, this would have been a meaningless repo.
- justinclift 8 hours ago
  
  That's truly a wonderful collection of pelicans riding bicycles.
  Much Win! ;)
- ValentineC 4 hours ago
  
  These are amazing. I smiled after I saw just how wonderfully rendered they are.
- razodactyl 7 hours ago
  
  These pelicans are clearly indicative of good RL training algorithms.
- smcleod 10 hours ago
  
  This is pretty funny
- ahmadyan 9 hours ago
  
  I love it!
- icelancer 11 hours ago
  
  love this adversarial work
  
  2 replies →
abustamam 8 hours ago
Could be! Simon wrote about that here though https://simonwillison.net/2025/Nov/13/training-for-pelicans-...
- stingraycharles 5 hours ago
  
  > If a model finally comes out that produces an excellent SVG of a pelican riding a bicycle you can bet I’m going to test it on all manner of creatures riding all sorts of transportation devices.
  This relies on the false premise that, if they would include it in their training dataset, it would be perfect. All they need to do is be good enough and better than the other, not perfect.
  
  1 reply →
BrokenCogs 10 hours ago

Yes we all know that, but we still like to see the pelicans because it's a tradition more or less
ffsm8 14 hours ago
Clearly not.
I mean the prompt was succinct and clear, as always - and it still decided to hallucinate multiple features (animation + controls) beyond the prompt.
It'd also like to point out that to date no drawing was actually good from an actual quality perspective (as in comparative to what a decent designer would throw together)
Theyre always only "good" from the perspective of it being a one shot low effort prompt. Very little content for training purposes.
- nwienert 14 hours ago
  
  The way I’ve come to think of LLM is that what the produce in a single reply even with thinking turned up, is akin to what you’d do in a single short session of work.
  And so if you ask it to do something big it will do a very surface level implementation. But if you have it iterate many times, or give it small pieces each time, you’ll end up with something closer to what a human would do.
  I imagine the pelican test but done in a harness that has the agents iterate 10+ times would be closer to what you’d expect, especially if a visual model was critiquing each time.
  
  1 reply →
- serial_dev 12 hours ago
  
  What does good even mean… I have no idea what a good “pelican on a bike” should look like. It’s a fun prompt because there is no good answers… at least so I thought.
  
  2 replies →
GorbachevyChase 6 hours ago

I’m OK with a Chinese model getting the W. It’s ultimately good for all of us.

SwellJoe 14 hours ago

We got an overachiever, here. Kimi sounds like a teacher's pet kind of name.

subscribed 13 hours ago

Underappreciated comment

makingstuffs 2 hours ago

It looks like a drunk pelican rolling downhill on its bicycle

HarHarVeryFunny 12 hours ago

Too bad they didn't put equal effort into the pelican's legs and feet. Left leg paralyzed and not moving, and right ankle flipping around in alarming fashion!

disiplus 11 hours ago

was part of the beta, its properly good model, in some sense i forgot that im not on opus or gpt. opus is still better. gpt is the one struggling for me. it has some niche in backend work but you can get the same with opus with skills, its lacking in almost all others.

OtomotO 10 hours ago

Funny, for me Opus is struggling since about February.
4.7 made no difference, so for the first time in many moons I am cancelling my subscription.

hn8726 14 hours ago

[flagged]

lambda 14 hours ago

It's a lighthearted, fun, visual benchmark that's not part of the standard benchmarks; and at least traditionally, it was not something that the labs trained on so it was something of a measure of how well the intelligence of the model generalized. Part of the idea of LLMs is that they pick up general knowledge and reasoning ability, beyond any tasks that they are specifically trained for, from the vast quantity of data that they are trained on.
Of course, a while back there was a Gemini release that I believe specifically called out their ability to produce SVGs, for illustration and diagramming purposes. So it's not longer necessarily the case that the labs aren't training on generating SVGs, and in fact, there's a good chance that even if they're not doing so explicitly, the RLVR process might be generating tasks like that as there is more and more focus on frontend and design in the LLM space. So while they might not be specifically training for a pelican riding a bicycle, they may actually be training on SVG diagram quality.
nickthegreek 13 hours ago

This isn't even a normal pelican image post, this one created the html control system that animates the distance the wing travels from its pivot in time with the rotation of the wheel speed. Let's not pretend this is a solved problem and models are dumping about perfect pelicans on bikes one after another (or ever?).
Surely, you know someone makes the same post you did every time one is posted. Surly you see the answers and pushback since you are familiar with these posts. Genuine question, did you expect a different answer this time?
hamdouni 14 hours ago
Maybe this can help
https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/
- hn8726 11 hours ago
  
  It doesn't, I get that it's _a_ benchmark. It's just not a good or insightful one, and having it posted so often on HN feels like low quality spam at this point
  
  1 reply →
walthamstow 13 hours ago

It's a great filter for people who take things far too seriously
Strom 13 hours ago
It's tradition at this point. Based on the upvotes the comment receives, it looks like many readers find value in it.
- hn8726 11 hours ago
  
  Upvotes are cheap, the fact that something is upvoted doesn't mean it's valuable (see: Reddit). Another thing is how insightful is the discussion under a typical pelican comment are (and how much of it is related to the pelican and how often it's just where the general discussion happens).
  
  1 reply →
- charcircuit 12 hours ago
  
  [flagged]
  
  1 reply →
renewiltord 4 hours ago

Every forum gets regulars and their fan clubs. If you go to /r/comics and look at top for the month you'll see 4 out of 5 are pizzacakecomic. People on these forums sort of form a fanclub around 'their guy'. This forum's guy is this chap. Not much point being upset about it, tbh.
Mashimo 13 hours ago

I, for one, find it entertaining.
wotsdat 14 hours ago

[dead]
rolymath 13 hours ago

[flagged]
snendroid-ai 12 hours ago
[flagged]
- Mashimo 12 hours ago
  
  Well clearly some people care.