Comment by simonw
9 hours ago
I like the pelican I got out of deepseek-v4-flash more than the one I got from deepseek-v4-pro.
https://simonwillison.net/2026/Apr/24/deepseek-v4/
Both generated using OpenRouter.
For comparison, here's what I got from DeepSeek 3.2 back in December: https://simonwillison.net/2025/Dec/1/deepseek-v32/
And DeepSeek 3.1 in August: https://simonwillison.net/2025/Aug/22/deepseek-31/
And DeepSeek v3-0324 in March last year: https://simonwillison.net/2025/Mar/24/deepseek/
No way. The Pro pelican is fatter, has a customized front fork, and the sun is shining! He’s definitely living the best life.
The pro pelican is a work of art! It goes dimensions that no other LLM has gone before.
yeah. look at these 4 feathers (?) on his bum too.
a lot of dumplings
This is just a random thought, but have you tried doing an 'agentic' pelican?
As in have the model consider its generated SVG, and gradually refine it, using its knowledge of the relative positions and proportions of the shapes generated, and have it spin for a while, and hopefully the end result will be better than just oneshotting it.
Or maybe going even one step further - most modern models have tool use and image recognition capabilities - what if you have it generate an SVG (or parts/layers of it, as per the model's discretion) and feed it back to itself via image recognition, and then improve on the result.
I think it'd be interesting to see, as for a lot of models, their oneshot capability in coding is not necessarily corellated with their in-harness ability, the latter which really matters.
I tried that for the GPT-5 launch - a self-improving loop that renders the SVG, looks at it and tries again - and the results were surprisingly disappointing.
I should try it again with the more recent models.
I see, thanks. I guess most current models are not yet trained for this loop.
Could you please try with Opus 4.7? I think there's a chance of it doing well, considering the design/vision focus.
The Flash one is pretty impressive. Might be my favorite so far in the pelican-riding-a-bicycle series
DeepSeek pelicans are the angriest pelicans I’ve seen so far.
they're just late for work.
They're stressed pelicans from Hangzhou.
996 Pelican, lol
Being a bicycle geometry nerd I always look at the bicycle first.
Let me tell you how much the Pro one sucks... It looks like failed Pedersen[1]. The rear wheel intersects with the bottom bracket, so it wouldn't even roll. Or rather, this bike couldn't exist.
The flash one looks surprisingly correct with some wild fork offset and the slackest of seat tubes. It's got some lowrider[2] aspirations with the small wheels, but with longer, Rivendellish[3], chainstays. The seat post has different angle than the seat tube, so good luck lowering that.
[1] https://en.wikipedia.org/wiki/Pedersen_bicycle
[2] https://en.wikipedia.org/wiki/Lowrider_bicycle
[3] https://www.rivbike.com/
This is an excellent comment. Thanks for this - I've only ever thought about whether the frame is the right shape, I never thought about how different illustrations might map to different bicycle categories.
Some other reactions:
I wonder which model will try some more common spoke lacing patterns. Right now there seems to be a preference for radial lacing, which is not super common (but simple to draw). The Flash and Pro one uses 16 spoke rims, which actually exist[1] but are not super common.
The Pro model fails badly at the spokes. Heck, the spokes sit on the outside of the drive side of the rim and tire. Have a nice ride riding on the spokes (instead of the tire) welded to the side of your rim.
Both bikes have the drive side on the left, which is very very uncommon. That can't exist in the training data.
[1] https://cicli-berlinetta.com/product/campagnolo-shamal-16-sp...
The Pedersen looks like someone failed the "draw a bicycle" test and decided to adjust the universe.
I think the pelican on a bike is known widely enough that of seizes to be useful as a benchmark. There is even a pelican briefly appearing in the promo video of GPT-5, if I'm not mistaken https://openai.com/gpt-5/. So the companies are apparently aware of it.
It was a bigger deal in the Gemini 3.1 launch: https://x.com/JeffDean/status/2024525132266688757
What was your prompt for the image? Apologies if this should be obvious.
>Generate an SVG of a pelican riding a bicycle
at the top of the linked pages.
To me this is the perfect proof that
1) LLM is not AGI. Because surely if AGI it would imply that pro would do better than flash?
2) and because of the above, Pelican example is most likely already being benchmaxxed.
Is it then Deepseek hosted by Deepseek?
How much does the drawing change if you ask it again?
I really like the pro version. The pelican is so cute.
Where is the GPT 5.5 Pelican?
https://news.ycombinator.com/item?id=47879092#47880421
In the 5.5 topic
Why they so angry?
[flagged]
It's just Simon Willison (the person you are replying to) who always makes a pelican, as his personal flippant benchmark. It's not that deep.
No benchmark will be perfect, especially if it's public but it's a fun experiment to visually see how these models get better and better.
Why is it so wrong?
Thanks for the "scientific air" remark, that gave me a genuine LOL.
"The difference between screwing around and science is writing it down" -- Adam Savage
This should not be the top comment on every model release post. It's getting tiring.
This should be the bottom comment on the pelican comment on every model release post.
Clearly the top comment should be "Imagine a beowulf cluster of Deepseek v4!"
2 replies →