Comment by weq
8 days ago
I dont get how these tools are considered good when they cant even do a simple thing decribing this scene.
> i was to bring awareness to the dangers of dressing up like a seal while surfboarding (ie. wearing black wetsuites, arms hanging over the board). Create a scene from the perspective of a shark looking up from the bottom of the ocean into a clear blue sky with silhouettes of a seal and a surfer and fishing boat with line dangling in the water and show how the shark contemplates attacking all these objects because they look so similiar.
I havnt found a model yet that can process that description, or any varition, into a scene that usable and makes sense visually to anyone older the a 1st grader. They will never place the seal, surfer, shark or boat in the correct location to make sense visually. Typically everyone is under water, sizing of everything is wrong. You tell them to the image is wrong, to place the person ontop of the water, and they cant. Please can someone link to a model that is capable or tell me what i am doing wrong? How can you claim to process words into images in a repeatable way when these systems cant deal with multiple contraints at once?
You'll have somewhat better luck if you fix the spelling errors.
https://lmarena.ai/c/019a84ec-db09-7f53-89b1-3b901d4dc6be
https://gemini.google.com/share/da93030f131b
Obviously neither are good but it is better.
I think image models could be producing a lot more editable outputs if eg they output multi-layer PSDs.