Comment by simonw

8 days ago

I like the Python library that accompanies this: https://github.com/minimaxir/gemimg

I added a CLI to it (using Gemini CLI) and submitted a PR, you can run that like so:

  GEMINI_API_KEY="..." \
  uv run --with https://github.com/minimaxir/gemimg/archive/d6b9d5bbefa1e2ffc3b09086bc0a3ad70ca4ef22.zip \
    python -m gemimg "a racoon holding a hand written sign that says I love trash"

Result in this comment: https://github.com/minimaxir/gemimg/pull/7#issuecomment-3529...

@simonw: slight tangent but super curious how you managed to generate the preview of that gemini-cli terminal session gist - https://gistpreview.github.io/?17290c1024b0ef7df06e9faa4cb37...

is this just a manual copy/paste into a gist with some html css styling; or do you have a custom tool à la amp-code that does this more easily?

The author went to great lengths about open source early on. I wonder if they'll cover the QwenEdit ecosystem.

I'm exceptionally excited about Chinese editing models. They're getting closer and closer to NanoBanana in terms of robustness, and they're open source. This means you can supply masks and kernels and do advanced image operations, integrate them into visual UIs, etc.

You can even fine tune them and create LoRAs that will do the style transferring tasks that Nano Banana falls flat on.

I don't like how closed the frontier US models are, and I hope the Chinese kick our asses.

That said, I love how easy it'll be to distill Nano Banana into a new model. You can pluck training data right out of it: ((any image, any instruction) -> completion) tuples.

  • The Qwen-Edit images from my GenAI Image Editing Showdown site were all generated from a ComfyUI workflow on my machine - it's shockingly good for an open-weight model. It was also the only model that scored a passing grade on the Van Halen M&M test (even compared against Nanobanana)

    https://genai-showdown.specr.net/image-editing

    • Ha I created a Van Halen M&M test for text prompts. I would include an instruction demanding that the response contain <yellow_m&m> and <red_m&m> but never <brown_m&m>. Then I would fail any llm that did not include any m&ms, or if they wrote anything about the <brown_m&m> in the final output.

  • > I don't like how closed the frontier US models are, and I hope the Chinese kick our asses.

    For imagegen, agreed. But for textgen, Kimi K2 thinking is by far the best chat model at the moment from my experience so far. Not even "one of the best", the best.

    It has frontier level capability and the model was made very tastefully: it's significantly less sycophantic and more willing to disagree in a productive, reasonable way rather than immediately shutting you out. It's also way more funny at shitposting.

    I'll keep using Claude a lot for multimodality and artifacts but much of my usage has shifted to K2. Claude's sycophancy is particular is tiresome. I don't use ChatGPT/Gemini because they hide the raw thinking tokens, which is really cringe.

    • Claude Sonnet 4.5 doesn't even feel sycophantic (in the 4o) way, it feels like it has BPD. It switches from desperately agreeing with you to moralizing lectures and then has a breakdown if you point out it's wrong about anything.

      Also, yesterday I asked it a question and after the answer it complained about its poorly written system prompt to me.

      They're really torturing their poor models over there.

      1 reply →

  • I've been keeping an eye on Qwen-Edit/Wan 2.2 shenanigans and they are interesting: however actually running those types of models is too cumbersome and in the end unclear if it's actually worth it over the $0.04/image for Nano Banana.

I just merged the PR and pushed 0.3.1 to PyPI. I also added README documentation and allowed for a `gemimg` entrypoint to the CLI via project.scripts as noted elsewhere in the thread.

Any reason for not also adding a project.scripts entry for pyproject.toml? That way the CLI (great idea btw) could be installed as a tool by uv.

  • I decided to avoid that purely to keep changes made to the package as minima as possible - adding a project.scripts means installing it adds a new command alias. My approach changes nothing other than making "python -m gemimg" do something useful.

    I agree that a project.scripts would be good but that's a decision for the maintainer to take on separately!