Comment by simonw

9 hours ago

I like the pelican I got out of deepseek-v4-flash more than the one I got from deepseek-v4-pro.

https://simonwillison.net/2026/Apr/24/deepseek-v4/

Both generated using OpenRouter.

For comparison, here's what I got from DeepSeek 3.2 back in December: https://simonwillison.net/2025/Dec/1/deepseek-v32/

And DeepSeek 3.1 in August: https://simonwillison.net/2025/Aug/22/deepseek-31/

And DeepSeek v3-0324 in March last year: https://simonwillison.net/2025/Mar/24/deepseek/

38 comments

simonw

JSR_FDED 9 hours ago

No way. The Pro pelican is fatter, has a customized front fork, and the sun is shining! He’s definitely living the best life.

chronogram 8 hours ago

The pro pelican is a work of art! It goes dimensions that no other LLM has gone before.
w4yai 9 hours ago

yeah. look at these 4 feathers (?) on his bum too.
oliver236 9 hours ago

a lot of dumplings

torginus 8 hours ago

This is just a random thought, but have you tried doing an 'agentic' pelican?

As in have the model consider its generated SVG, and gradually refine it, using its knowledge of the relative positions and proportions of the shapes generated, and have it spin for a while, and hopefully the end result will be better than just oneshotting it.

Or maybe going even one step further - most modern models have tool use and image recognition capabilities - what if you have it generate an SVG (or parts/layers of it, as per the model's discretion) and feed it back to itself via image recognition, and then improve on the result.

I think it'd be interesting to see, as for a lot of models, their oneshot capability in coding is not necessarily corellated with their in-harness ability, the latter which really matters.

simonw 7 hours ago
I tried that for the GPT-5 launch - a self-improving loop that renders the SVG, looks at it and tries again - and the results were surprisingly disappointing.
I should try it again with the more recent models.
- torginus 5 hours ago
  
  I see, thanks. I guess most current models are not yet trained for this loop.
  Could you please try with Opus 4.7? I think there's a chance of it doing well, considering the design/vision focus.

nickvec 9 hours ago

The Flash one is pretty impressive. Might be my favorite so far in the pelican-riding-a-bicycle series

murkt 9 hours ago

DeepSeek pelicans are the angriest pelicans I’ve seen so far.

kristopolous 9 hours ago

they're just late for work.
muyuu 3 hours ago

They're stressed pelicans from Hangzhou.
lazycatjumping 7 hours ago

996 Pelican, lol

mikae1 9 hours ago

Being a bicycle geometry nerd I always look at the bicycle first.

Let me tell you how much the Pro one sucks... It looks like failed Pedersen[1]. The rear wheel intersects with the bottom bracket, so it wouldn't even roll. Or rather, this bike couldn't exist.

The flash one looks surprisingly correct with some wild fork offset and the slackest of seat tubes. It's got some lowrider[2] aspirations with the small wheels, but with longer, Rivendellish[3], chainstays. The seat post has different angle than the seat tube, so good luck lowering that.

[1] https://en.wikipedia.org/wiki/Pedersen_bicycle

[2] https://en.wikipedia.org/wiki/Lowrider_bicycle

[3] https://www.rivbike.com/

simonw 9 hours ago
This is an excellent comment. Thanks for this - I've only ever thought about whether the frame is the right shape, I never thought about how different illustrations might map to different bicycle categories.
- mikae1 8 hours ago
  
  Some other reactions:
  I wonder which model will try some more common spoke lacing patterns. Right now there seems to be a preference for radial lacing, which is not super common (but simple to draw). The Flash and Pro one uses 16 spoke rims, which actually exist[1] but are not super common.
  The Pro model fails badly at the spokes. Heck, the spokes sit on the outside of the drive side of the rim and tire. Have a nice ride riding on the spokes (instead of the tire) welded to the side of your rim.
  Both bikes have the drive side on the left, which is very very uncommon. That can't exist in the training data.
  [1] https://cicli-berlinetta.com/product/campagnolo-shamal-16-sp...
jojobas 8 hours ago

The Pedersen looks like someone failed the "draw a bicycle" test and decided to adjust the universe.

catelm 9 hours ago

I think the pelican on a bike is known widely enough that of seizes to be useful as a benchmark. There is even a pelican briefly appearing in the promo video of GPT-5, if I'm not mistaken https://openai.com/gpt-5/. So the companies are apparently aware of it.

simonw 7 hours ago

It was a bigger deal in the Gemini 3.1 launch: https://x.com/JeffDean/status/2024525132266688757

brutal_chaos_ 8 hours ago

What was your prompt for the image? Apologies if this should be obvious.

shawn_w 8 hours ago

>Generate an SVG of a pelican riding a bicycle
at the top of the linked pages.

nsoonhui 8 hours ago

To me this is the perfect proof that

1) LLM is not AGI. Because surely if AGI it would imply that pro would do better than flash?

2) and because of the above, Pelican example is most likely already being benchmaxxed.

chvid 8 hours ago

Is it then Deepseek hosted by Deepseek?

How much does the drawing change if you ask it again?

ycui1986 9 hours ago

I really like the pro version. The pelican is so cute.

theanonymousone 9 hours ago

Where is the GPT 5.5 Pelican?

simonw 7 hours ago

https://news.ycombinator.com/item?id=47879092#47880421
culopatin 8 hours ago

In the 5.5 topic

lobochrome 9 hours ago

Why they so angry?

whateveracct 9 hours ago

[flagged]

fastball 9 hours ago

It's just Simon Willison (the person you are replying to) who always makes a pelican, as his personal flippant benchmark. It's not that deep.
dewey 9 hours ago

No benchmark will be perfect, especially if it's public but it's a fun experiment to visually see how these models get better and better.
post-it 9 hours ago

Why is it so wrong?
simonw 9 hours ago
Thanks for the "scientific air" remark, that gave me a genuine LOL.
- a96 6 hours ago
  
  "The difference between screwing around and science is writing it down" -- Adam Savage

EnPissant 8 hours ago

This should not be the top comment on every model release post. It's getting tiring.

blitzar 8 hours ago
This should be the bottom comment on the pelican comment on every model release post.
- EnPissant 8 hours ago
  
  Clearly the top comment should be "Imagine a beowulf cluster of Deepseek v4!"
  
  2 replies →