← Back to context

Comment by FlyingSnake

1 day ago

At this point drawing these Pelicans must be in the training data sets.

not if I can help it!

https://github.com/scosman/pelicans_riding_bicycles

Could be! Simon wrote about that here though https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

  • > If a model finally comes out that produces an excellent SVG of a pelican riding a bicycle you can bet I’m going to test it on all manner of creatures riding all sorts of transportation devices.

    This relies on the false premise that, if they would include it in their training dataset, it would be perfect. All they need to do is be good enough and better than the other, not perfect.

    • I'm not sure if we can have a "perfect" Pelican riding a bicycle. Like, I could probably commission a highly experienced artist to draw one and I don't think it would be perfect. The legs would probably have to be too long, or pedals oddly placed, or handles strange, or wings with hands.

      Based on the one Simon commented though, I'd say we're in decent territory to try the latter part of his hypothesis.

      1 reply →

Clearly not.

I mean the prompt was succinct and clear, as always - and it still decided to hallucinate multiple features (animation + controls) beyond the prompt.

It'd also like to point out that to date no drawing was actually good from an actual quality perspective (as in comparative to what a decent designer would throw together)

Theyre always only "good" from the perspective of it being a one shot low effort prompt. Very little content for training purposes.

  • The way I’ve come to think of LLM is that what the produce in a single reply even with thinking turned up, is akin to what you’d do in a single short session of work.

    And so if you ask it to do something big it will do a very surface level implementation. But if you have it iterate many times, or give it small pieces each time, you’ll end up with something closer to what a human would do.

    I imagine the pelican test but done in a harness that has the agents iterate 10+ times would be closer to what you’d expect, especially if a visual model was critiquing each time.

    • Yeah, this is how I use AI. Instead of a single session one-shot, it's usually limited to single targeted edits, and then I steer it on each step. Takes longer but the output is actually what I want.