Comment by echelon
8 days ago
The author went to great lengths about open source early on. I wonder if they'll cover the QwenEdit ecosystem.
I'm exceptionally excited about Chinese editing models. They're getting closer and closer to NanoBanana in terms of robustness, and they're open source. This means you can supply masks and kernels and do advanced image operations, integrate them into visual UIs, etc.
You can even fine tune them and create LoRAs that will do the style transferring tasks that Nano Banana falls flat on.
I don't like how closed the frontier US models are, and I hope the Chinese kick our asses.
That said, I love how easy it'll be to distill Nano Banana into a new model. You can pluck training data right out of it: ((any image, any instruction) -> completion) tuples.
The Qwen-Edit images from my GenAI Image Editing Showdown site were all generated from a ComfyUI workflow on my machine - it's shockingly good for an open-weight model. It was also the only model that scored a passing grade on the Van Halen M&M test (even compared against Nanobanana)
https://genai-showdown.specr.net/image-editing
Ha I created a Van Halen M&M test for text prompts. I would include an instruction demanding that the response contain <yellow_m&m> and <red_m&m> but never <brown_m&m>. Then I would fail any llm that did not include any m&ms, or if they wrote anything about the <brown_m&m> in the final output.
> I don't like how closed the frontier US models are, and I hope the Chinese kick our asses.
For imagegen, agreed. But for textgen, Kimi K2 thinking is by far the best chat model at the moment from my experience so far. Not even "one of the best", the best.
It has frontier level capability and the model was made very tastefully: it's significantly less sycophantic and more willing to disagree in a productive, reasonable way rather than immediately shutting you out. It's also way more funny at shitposting.
I'll keep using Claude a lot for multimodality and artifacts but much of my usage has shifted to K2. Claude's sycophancy is particular is tiresome. I don't use ChatGPT/Gemini because they hide the raw thinking tokens, which is really cringe.
Claude Sonnet 4.5 doesn't even feel sycophantic (in the 4o) way, it feels like it has BPD. It switches from desperately agreeing with you to moralizing lectures and then has a breakdown if you point out it's wrong about anything.
Also, yesterday I asked it a question and after the answer it complained about its poorly written system prompt to me.
They're really torturing their poor models over there.
It rubs the data on its skin or else it gets the prompt again!
I've been keeping an eye on Qwen-Edit/Wan 2.2 shenanigans and they are interesting: however actually running those types of models is too cumbersome and in the end unclear if it's actually worth it over the $0.04/image for Nano Banana.
I was skeptical about the notion of running similar models locally as well, but the person who did this (https://old.reddit.com/r/StableDiffusion/comments/1osi1q0/wa... ) swears that they generated it locally, just letting a single 5090 crunch away for a week.
If that's true, it seems worth getting past the 'cumbersome' aspects. This tech may not put Hollywood out of business, but it's clear that the process of filmmaking won't be recognizable in 10 years if amateurs can really do this in their basements today.
Neural Viz has been putting out some extremely high quality content recently, these seem to be the closest I've seen to approaching Hollywood level:
https://www.youtube.com/watch?v=5bYA2Rv2CQ8
https://www.youtube.com/watch?v=rfTnW8pl3DE
Takes a couple mouse clicks in ComfyUI
On that subject - ComfyUI is not the future of image gen. It's an experimental rope bridge.
Adobe's conference last week points to the future of image gen. Visual tools where you mold images like clay. Hands on.
Comfy appeals to the 0.01% that like toolkits like TouchDesigner, Nannou, and ShaderToy.
3 replies →