Comment by simonw

1 day ago

I generated pelicans riding bicycles on both thinking level low and thinking level high:

https://gist.github.com/simonw/68560eddb0b268a8417f80ceb7304...

The high one is notably better - the bicycle frame is the correct shape, unlike thinking level low.

For comparison, here's Opus 4.7: https://gist.github.com/simonw/afcb19addf3f38eb1996e1ebe749c...

It's pretty safe to say that AI will be used on the battlefield making real life and death decisions before it will be able to render a decent pelican on a bike in SVG.

  • It already has been and this has been widely written about. AI was used to identify and prioritize targets for the US to bomb in Iran.

    Here's an article from 2 months ago for example: https://www.theguardian.com/technology/commentisfree/2026/ma...

    It was also implicated in the bombing of a girls elementary school which left 168 dead. The US did a "triple tap" to kill any first responders.

    https://www.theguardian.com/news/2026/mar/26/ai-got-the-blam...

    https://www.theguardian.com/technology/2026/apr/01/dont-blam...

    • I read the article and it doesn’t say it was used for targeting or prioritizing?

      > Neither Claude nor any other LLMs detects targets, processes radar, fuses sensor data or pairs weapons to targets. LLMs are late additions to Palantir’s ecosystem. In late 2024, years after the core system was operational, Palantir added an LLM layer – this is where Claude sits – that lets analysts search and summarise intelligence reports in plain English

      There’s a lot of humans in that loop who make those decisions.

      6 replies →

  • I think it's beyond decent. I don't understand how people are not more impressed by this. Just a few years ago the only expectation would be garbled nonsense.

  • Haha, yeah. I tried for it to create a SVG with scissors and it was hopelessly overwhelmed. I think at least the SVG design niche will be safe a little while longer

  • the battlefield sounds much easier. worst case scenario you kill somebody, but that's what you're trying to do anyways.

    if you kill somebody while trying to render a pelican on a bicycle it's a real problem.

    • In many battlefield scenarios, there is more than one "somebody" on it. The "somebody" that you kill might not be the "somebody" that you intended to kill.

      Depending on the how pelicans are created, it is entirely possible to indirectly kill "somebody" due to the externalised costs of global warming etc.

    • "shift left" on the battlefield. break down those silos. if you have to ask for permission it's already too late. remember the goal. find the bottlenecks in your system and remove them.

> the bicycle frame is the correct shape

No, the handlebar is wrong. The handle bar is rotating the frame instead of rotating the front wheel. The handle bar should be mounted on the same line as the front wheel is.

Hopefully 4.9 will read my comments :)

I bet someone shares this link every time you post about bicycles, but since I didn't see anyone share it yet in this thread, I'll take the opportunity to do so:

https://www.gianlucagimini.it/portfolio-item/velocipedia/

Turns out even humans can be pretty bad at drawing bicycles :)

  • On a new model release, you can guarantee two things are in the replies to Simon. One is your link, the other is "surely the models are being trained on this now"

  • Sure, but no one is trying to force art from most people into about every area in the economy where anyone ever pays for something visual. If you asked professional artists to draw a realistic bicycle, I'm guessing few of them would try to just randomly guess what the mechanical parts looked like

  • > The most unintelligible drawing has also the most unintelligible handwriting. It was made by a doctor.

    Haha

  • But if you need to draw a bicycle, you wouldn’t pick a random person in the street. You would hire an artist and you’d be guaranteed to have at least a believable one if not a perfect rendering.

    No guarantees is why LLM is akin to gambling. Every new context is essentially picking someone out of the crowd.

Sadly I think the correlation between this benchmark and performance is starting to break down imo. Still a legendary idea will be remembered and ingrained in the models forever haha

Here's pelicans in all of the thinking levels - low, medium, high, xhigh, max

https://tools.simonwillison.net/markdown-svg-renderer#url=ht...

It's funny that we've reached the level where LLMs draw more correct bikes than any random person

Simon, is your pelican test really captures differences among models or should you at least try like 10 times or something to average the random effects

I actually like the 4.7 the most, interestingly enough. Not like you can "objectively" weight artistic output like this.

I find the most miraculous thing about 4.7 to be that the pelican is facing left, wonder why the right facing everything is so ubiquitous in these images.

thanks for always providing this very much on time. I'm wondering what the next, harder challenge could be? Maybe some animated svg?

Am I allowed to say that pelican's little helmet is adorable? I can't provide a strong computational proof, or even a shred of anecdata...

...but that pelican's little helmet is adorable.

Eventually the frontier model folks are going to pick up on your pelican on a bike test and bake-in flawless results for that particular request.