← Back to context

Comment by vunderba

3 days ago

The site is broken up into "Editing Comparison" and a "Generative Comparison" sections.

Generative: https://genai-showdown.specr.net

Editing: https://genai-showdown.specr.net/image-editing

Style is mostly irrelevant for editing, since the goal is to integrate seamlessly with the existing image. The focus is on performing relatively surgical edits or modifications to existing imagery while minimizing changes to the rest of the image. It is also primarily concerned with realism, though there are some illustrative examples (the JAWS poster, Great Wave off Kanagawa).

This contrasts with the generative section though even then the emphasis is on prompt adherence, and style/fidelity take a backseat (which is honestly what 99% of existing generative benchmarks already focus on).

Oh, thank you for your reply. We may have different definitions of style and what editing would mean.

If you look for example at "Mermaid Disciplinary Committee", every single image is in a very different style, each that you can consider a default of what the model assume would be for the specific prompt. It's quite obvious that these styles were 'baked in' the models, and it's not clear how much you can steer in a specific style. If you look at "The Yarrctic Circle", a lot more models default to a kind of "generic concept art" style (the "by greg rutkowski" meme) but even then I would classify the results as at least 5 distinct styles. So for me this benchmark is not checking style at all, unless you consider style to be just around 4 categories (cartoon, anime, realistic, painterly).

So regarding image editing, I did my own tests at the first release of Flux tools, and found that it was almost impossible to get any decent results on some specific styles, specifically cartoon and concept art styles. I think the tools focus on what imaginary marketing people would want (like "put this can of sugary beverage into an idyllic scene") rather than such use cases. So editing like "color this" or other changes would just be terrible, and certainly unusable.

I didn't go very far with my own benchmarks because my results were just so bad. But for example, here's a line art with the instruction to color it (I can't remember the prompt, I didn't take notes).

https://woolion.art/assets/img/ai/ai_editing.webp

It's original, ChatGPT, Flux.

Still, you can see that ChatGPT just throw everything out and does not do a minimal attempt at respecting style. Flux is quite bad, but it follows the design much more (although it gets completely confused by it) that it seems that with a whole lot of work you could get something out of it.

  • Yeah so NOVEL style transfer without the use of a trained LoRA is, to my knowledge, still a relatively unsolved problem. Even in SOTA models like Nano Banana Pro, if you attach several images with a distinct artistic style that is outside of its training data and use a prompt such as:

    "Using the attached images as stylistic references, create an image of X"

    It's fall down pretty hard.

    https://imgur.com/a/o3htsKn

    • I'm pretty sure that some model at least advertised that it would work. I also think your example was in the training data at some point least, but I suspect these styles are kind of pruned when the models are steered towards "aesthetically pleasing" outputs which are often used as benchmarks. Thanks for the replies, it's quite informative.

      1 reply →