Comment by dvt

12 hours ago

This is an amazing test and it's kinda' funny how terrible gpt-2-image is. I'd take "plagiarized" images (e.g. Google search & copy-paste) any day over how awful the OpenAI result is. Doesn't even seem like they have a sanity checker/post-processing "did I follow the instructions correctly?" step, because the digit-style constraint violation should be easily caught. It's also expensive as shit to just get an image that's essentially unusable.

9 comments

dvt

the_arun 11 hours ago

This is from Gemini - https://lens.usercontent.google.com/banana?agsi=CmdnbG9iYWw6...

fblp 9 hours ago
Did it correctly follow the instructions? Don't know my pokemon well enough.
- minimaxir 9 hours ago
  
  Essentially yes (bottom got distorted), but Gemini uses Nano Banana Pro or Nano Banana 2 so it's not a surprising result. The image I linked uses the raw API.
  
  1 reply →

anshumankmr 11 hours ago

that is interesting cause I feel gpt-image-1 did have that feature.

(source: https://chatgpt.com/share/69e83569-b334-8320-9fbf-01404d18df...)

weird-eye-issue 9 hours ago
You are comparing ChatGPT to a raw image model. These are two completely different things. ChatGPT takes your input, modifies the prompt and then passes it to the image model and then will maybe read the image and provide output. The image model like through the API just takes the prompt verbatim and generates an image.
- minimaxir 9 hours ago
  
  Nano Banana Pro and ChatGPT Images 2.0 also tweak the prompt because they can think.
  
  1 reply →

hyperadvanced 11 hours ago

I wouldn’t say it’s terrible. I wouldn’t say it’s a huge step forward in terms of quality compared to what I’ve seen before from AI