Comment by simonw

1 year ago

Generating images and consuming images are very different challenges, which for most models use entirely different systems (ChatGPT constructs prompts to DALL-E for example: https://simonwillison.net/2023/Oct/26/add-a-walrus/ )

Evaluating vision LLMs on their ability to improve their own generation of images doesn't make sense to me. That's why I enjoy torturing new models with my pelican on a bicycle SVG benchmark!