← Back to context

Comment by woolion

3 days ago

The comparison are very useful but also quite limited in terms of styles. Models tend to have extremely diverse abilities in following a given style against steering to its own.

It's pretty obvious that OpenAI is terrible at it -- it is known for its unmissable touch. However, for Flux it really depends on the style. They already posted at some point that they changed their training to avoid averaging different styles together, which is the ultimate AI look. But this is at odds with the goal to directly generate images that are visually appealing, so the style matching is going to be a problem for a while, at least.

The site is broken up into "Editing Comparison" and a "Generative Comparison" sections.

Generative: https://genai-showdown.specr.net

Editing: https://genai-showdown.specr.net/image-editing

Style is mostly irrelevant for editing, since the goal is to integrate seamlessly with the existing image. The focus is on performing relatively surgical edits or modifications to existing imagery while minimizing changes to the rest of the image. It is also primarily concerned with realism, though there are some illustrative examples (the JAWS poster, Great Wave off Kanagawa).

This contrasts with the generative section though even then the emphasis is on prompt adherence, and style/fidelity take a backseat (which is honestly what 99% of existing generative benchmarks already focus on).

  • Oh, thank you for your reply. We may have different definitions of style and what editing would mean.

    If you look for example at "Mermaid Disciplinary Committee", every single image is in a very different style, each that you can consider a default of what the model assume would be for the specific prompt. It's quite obvious that these styles were 'baked in' the models, and it's not clear how much you can steer in a specific style. If you look at "The Yarrctic Circle", a lot more models default to a kind of "generic concept art" style (the "by greg rutkowski" meme) but even then I would classify the results as at least 5 distinct styles. So for me this benchmark is not checking style at all, unless you consider style to be just around 4 categories (cartoon, anime, realistic, painterly).

    So regarding image editing, I did my own tests at the first release of Flux tools, and found that it was almost impossible to get any decent results on some specific styles, specifically cartoon and concept art styles. I think the tools focus on what imaginary marketing people would want (like "put this can of sugary beverage into an idyllic scene") rather than such use cases. So editing like "color this" or other changes would just be terrible, and certainly unusable.

  • I didn't go very far with my own benchmarks because my results were just so bad. But for example, here's a line art with the instruction to color it (I can't remember the prompt, I didn't take notes).

    https://woolion.art/assets/img/ai/ai_editing.webp

    It's original, ChatGPT, Flux.

    Still, you can see that ChatGPT just throw everything out and does not do a minimal attempt at respecting style. Flux is quite bad, but it follows the design much more (although it gets completely confused by it) that it seems that with a whole lot of work you could get something out of it.

    • Yeah so NOVEL style transfer without the use of a trained LoRA is, to my knowledge, still a relatively unsolved problem. Even in SOTA models like Nano Banana Pro, if you attach several images with a distinct artistic style that is outside of its training data and use a prompt such as:

      "Using the attached images as stylistic references, create an image of X"

      It's fall down pretty hard.

      https://imgur.com/a/o3htsKn

      2 replies →