Comment by echelon

8 days ago

The author went to great lengths about open source early on. I wonder if they'll cover the QwenEdit ecosystem.

I'm exceptionally excited about Chinese editing models. They're getting closer and closer to NanoBanana in terms of robustness, and they're open source. This means you can supply masks and kernels and do advanced image operations, integrate them into visual UIs, etc.

You can even fine tune them and create LoRAs that will do the style transferring tasks that Nano Banana falls flat on.

I don't like how closed the frontier US models are, and I hope the Chinese kick our asses.

That said, I love how easy it'll be to distill Nano Banana into a new model. You can pluck training data right out of it: ((any image, any instruction) -> completion) tuples.

The Qwen-Edit images from my GenAI Image Editing Showdown site were all generated from a ComfyUI workflow on my machine - it's shockingly good for an open-weight model. It was also the only model that scored a passing grade on the Van Halen M&M test (even compared against Nanobanana)

https://genai-showdown.specr.net/image-editing

  • Ha I created a Van Halen M&M test for text prompts. I would include an instruction demanding that the response contain <yellow_m&m> and <red_m&m> but never <brown_m&m>. Then I would fail any llm that did not include any m&ms, or if they wrote anything about the <brown_m&m> in the final output.

> I don't like how closed the frontier US models are, and I hope the Chinese kick our asses.

For imagegen, agreed. But for textgen, Kimi K2 thinking is by far the best chat model at the moment from my experience so far. Not even "one of the best", the best.

It has frontier level capability and the model was made very tastefully: it's significantly less sycophantic and more willing to disagree in a productive, reasonable way rather than immediately shutting you out. It's also way more funny at shitposting.

I'll keep using Claude a lot for multimodality and artifacts but much of my usage has shifted to K2. Claude's sycophancy is particular is tiresome. I don't use ChatGPT/Gemini because they hide the raw thinking tokens, which is really cringe.

  • Claude Sonnet 4.5 doesn't even feel sycophantic (in the 4o) way, it feels like it has BPD. It switches from desperately agreeing with you to moralizing lectures and then has a breakdown if you point out it's wrong about anything.

    Also, yesterday I asked it a question and after the answer it complained about its poorly written system prompt to me.

    They're really torturing their poor models over there.

I've been keeping an eye on Qwen-Edit/Wan 2.2 shenanigans and they are interesting: however actually running those types of models is too cumbersome and in the end unclear if it's actually worth it over the $0.04/image for Nano Banana.