Comment by justhw

10 days ago

I came to the same conclusion as the authors after generating 1000s of thumbnails[1]. OpenAI alters faces too much and smoothes out details by default. NanoBanana is the best but lacks high fidelity option. SeeDream is catching up to NanoBanana and sometimes is better. It's been too long since OpenAI's gpt-img-1 came out, hope they launch a better model soon.

[1] = https://thumbnail.ai/

I am probably at 50k-60k image generations from various models.

It is just very hard to make any generalizations because any single prompt will lead to so many different types of images.

The only thing I would really say to generalize is every model has strengths and weaknesses depending on what you are going for.

It is also generally very hard to explore all the possibilities of a model. So many times I thought I seen what the model could do just to be completely blown away by a particular generation.

  • What do you even do with 50k images? Even at just 10 seconds attention each, that's an solid entire week of waking time.

    • Youtube is full of AI slop right now, doesnt take much imagine to regconise how scammers (listed on an exhcange or not) are utilising it this... Take for instance a political influence organisation, generating avatars for vast bot networks that are implanted into social media to influence.

  • But someone has to know and evaluate all of those strengths and weaknesses, keep up with new models etc. Thats's work someone has to do or their product loses in quality. But that's fine when all products lose quality across the board.

  • Why so many?

    • FWIW when I do txt2img or img2img locally I have the batch set to 8-12 (so 12 variation images are generated from the same seed in the same gen) so it’s fairly easy to numerically end up with tens of thousands, which are usually 99% not good.

I don't know if you looked at the same article as I did, but nanobanana seems to be the worst by far at following the prompts. Just look at the heat map images

  • Half the time nanobanana doesn't do anything to the photo from my experience, also confirmed in some of these examples.

  • You can alter prompts yourself though to be clearer about what you want. The other things, you can't change.

Do you run thumbnail.ai? I would really like to try it, but I'm not going to pay before I've seen even a single generated thumbnail in my context. Is it unviable to let people generate at least a few thumbnails before they have to decide whether to pay?

I am a small time youtuber

I run a fairly comprehensive model comparison site (generative and editing). In my experience:

NanoBanana and Flux Kontext are the models that get closest to traditional SDXL inpainting techniques.

Seedream is a strong contender by virtue of its ability to natively handle higher resolutions (up to around 4 megapixels) so you lose less detail - however it also tends to alter the color palette more often then not.

Finally GPT-image-1 (yellowish filter notwithstanding) exhibits very strong prompt adherence but will almost always change a number of the details.