Comment by diggan

1 day ago

Lets give it a try, if you're willing to be the experiment subject :)

The prompt is "Generate an SVG of a pelican riding a bicycle" and you're supposed to write it by hand, so no graphical editor. The specification is here: https://www.w3.org/TR/SVG2/

I'm fairly certain I'd lose interest in getting it right before I got something better than most of those.

8 comments

diggan

zahlman 1 day ago

> The colors use traditional bicycle brown (#8B4513) and a classic blue for the pelican (#4169E1) with gold accents for the beak (#FFD700).

The output pelican is indeed blue. I can't fathom where the idea that this is "classic", or suitable for a pelican, could have come from.

diggan 1 day ago
My guess would be that it doesn't see the web colors (CSS color hexes) as proper hex triplets, but because of tokenization it could be something dumb like '#8B','451','3' instead. I think the same issue happens around multiple special characters after each other too.
- zahlman 21 hours ago
  
  No, it's understanding the colors properly. The SVG that the LLM created does use #4169E1 for the pelican color, and the LLM correctly describes this color as blue. The problem is that pelicans should not be blue.
- cap11235 16 hours ago
  
  Qwen3, at least, tokenizes each character of "#8B4513" separately.

mormegil 1 day ago

Did the testing prompt for LLMs include a clause forbidding the use of any tools? If not, why are you adding it here?

simonw 1 day ago
The way I run the pelican on a bicycle benchmark is to use this exact prompt:
Generate an SVG of a pelican riding a bicycle
And execute it via the model's API with all default settings, not via their user-facing interface.
Currently none of the model APIs enable tools unless you ask them to, so this method excludes the use of additional tools.
diggan 1 day ago

The models that are being put under the "Pelican" testing don't use a GUI to create SVGs (either via "tools" or anything else), they're all Text Generation models so they exclusively use text for creating the graphics.
There are 31 posts listed under "pelican-riding-a-bicycle" in case you wanna inspect the methodology even closer: https://simonwillison.net/tags/pelican-riding-a-bicycle/