Comment by petercooper
4 hours ago
Because image models at the basic level are just text tokens in, image tokens out. You'd need an agentic process on top to come up with a strategy, review output, try again, and so on.
I believe Nano Banana and gpt-image-2 have a little of this going on, but it's like asking a model to one-shot some code vs having an agentic harness with tools do it. Even the most basic agent can produce better code than ChatGPT can.
No comments yet
Contribute on Hacker News ↗