← Back to context

Comment by ionwake

6 days ago

only SVG counts tho, dont know why

Willison chose this task because (unlike actual images of pelicans) is was clearly not in training data, but could be reasoned about and composed from what's there. But just like those "how many golf balls can you fit in a 747?" interview questions, it should now be retired.

  • Thank you for the reply. Would something like a Squirrel flying a hangglider as an SVG be a good new test? Or would that be indirectly in the training data too?

It's a test of text-based LLMs to see how good they are at SVG geometry. Video models are a different category of software entirely.