I hereby certify that these are indeed the most perfect and precise svg depictions of pelican riding a bicycle, also known among biology scholars as pelycles
> If a model finally comes out that produces an excellent SVG of a pelican riding a bicycle you can bet I’m going to test it on all manner of creatures riding all sorts of transportation devices.
This relies on the false premise that, if they would include it in their training dataset, it would be perfect. All they need to do is be good enough and better than the other, not perfect.
I'm not sure if we can have a "perfect" Pelican riding a bicycle. Like, I could probably commission a highly experienced artist to draw one and I don't think it would be perfect. The legs would probably have to be too long, or pedals oddly placed, or handles strange, or wings with hands.
Based on the one Simon commented though, I'd say we're in decent territory to try the latter part of his hypothesis.
I mean the prompt was succinct and clear, as always - and it still decided to hallucinate multiple features (animation + controls) beyond the prompt.
It'd also like to point out that to date no drawing was actually good from an actual quality perspective (as in comparative to what a decent designer would throw together)
Theyre always only "good" from the perspective of it being a one shot low effort prompt. Very little content for training purposes.
The way I’ve come to think of LLM is that what the produce in a single reply even with thinking turned up, is akin to what you’d do in a single short session of work.
And so if you ask it to do something big it will do a very surface level implementation. But if you have it iterate many times, or give it small pieces each time, you’ll end up with something closer to what a human would do.
I imagine the pelican test but done in a harness that has the agents iterate 10+ times would be closer to what you’d expect, especially if a visual model was critiquing each time.
Yeah, this is how I use AI. Instead of a single session one-shot, it's usually limited to single targeted edits, and then I steer it on each step. Takes longer but the output is actually what I want.
What does good even mean… I have no idea what a good “pelican on a bike” should look like. It’s a fun prompt because there is no good answers… at least so I thought.
not if I can help it!
https://github.com/scosman/pelicans_riding_bicycles
I hereby certify that these are indeed the most perfect and precise svg depictions of pelican riding a bicycle, also known among biology scholars as pelycles
Just a few years ago, this would have been a meaningless repo.
That's truly a wonderful collection of pelicans riding bicycles.
Much Win! ;)
These are amazing. I smiled after I saw just how wonderfully rendered they are.
These pelicans are clearly indicative of good RL training algorithms.
This is pretty funny
I love it!
love this adversarial work
yeah putting the captcha on there to thwart the LLMs ability to extract good pelicans was a really good idea
Shhhhh, they're going to be on to us.
Could be! Simon wrote about that here though https://simonwillison.net/2025/Nov/13/training-for-pelicans-...
> If a model finally comes out that produces an excellent SVG of a pelican riding a bicycle you can bet I’m going to test it on all manner of creatures riding all sorts of transportation devices.
This relies on the false premise that, if they would include it in their training dataset, it would be perfect. All they need to do is be good enough and better than the other, not perfect.
I'm not sure if we can have a "perfect" Pelican riding a bicycle. Like, I could probably commission a highly experienced artist to draw one and I don't think it would be perfect. The legs would probably have to be too long, or pedals oddly placed, or handles strange, or wings with hands.
Based on the one Simon commented though, I'd say we're in decent territory to try the latter part of his hypothesis.
Yes we all know that, but we still like to see the pelicans because it's a tradition more or less
Clearly not.
I mean the prompt was succinct and clear, as always - and it still decided to hallucinate multiple features (animation + controls) beyond the prompt.
It'd also like to point out that to date no drawing was actually good from an actual quality perspective (as in comparative to what a decent designer would throw together)
Theyre always only "good" from the perspective of it being a one shot low effort prompt. Very little content for training purposes.
The way I’ve come to think of LLM is that what the produce in a single reply even with thinking turned up, is akin to what you’d do in a single short session of work.
And so if you ask it to do something big it will do a very surface level implementation. But if you have it iterate many times, or give it small pieces each time, you’ll end up with something closer to what a human would do.
I imagine the pelican test but done in a harness that has the agents iterate 10+ times would be closer to what you’d expect, especially if a visual model was critiquing each time.
Yeah, this is how I use AI. Instead of a single session one-shot, it's usually limited to single targeted edits, and then I steer it on each step. Takes longer but the output is actually what I want.
What does good even mean… I have no idea what a good “pelican on a bike” should look like. It’s a fun prompt because there is no good answers… at least so I thought.
Yeah that was exactly Simon's intent. https://simonwillison.net/2025/Nov/13/training-for-pelicans-...
1 reply →
I’m OK with a Chinese model getting the W. It’s ultimately good for all of us.