Comment by simonw

8 days ago

I like the Python library that accompanies this: https://github.com/minimaxir/gemimg

I added a CLI to it (using Gemini CLI) and submitted a PR, you can run that like so:

  GEMINI_API_KEY="..." \
  uv run --with https://github.com/minimaxir/gemimg/archive/d6b9d5bbefa1e2ffc3b09086bc0a3ad70ca4ef22.zip \
    python -m gemimg "a racoon holding a hand written sign that says I love trash"

Result in this comment: https://github.com/minimaxir/gemimg/pull/7#issuecomment-3529...

21 comments

simonw

sorcercode 8 days ago

@simonw: slight tangent but super curious how you managed to generate the preview of that gemini-cli terminal session gist - https://gistpreview.github.io/?17290c1024b0ef7df06e9faa4cb37...

is this just a manual copy/paste into a gist with some html css styling; or do you have a custom tool à la amp-code that does this more easily?

simonw 8 days ago
I used this tool: https://tools.simonwillison.net/terminal-to-html
I made a video about building that here: https://simonwillison.net/2025/Oct/23/claude-code-for-web-vi...
It works much better with Claude Code and Codex CLI because they don't mess around with scrolling in the same way as Gemini CLI does.
- sorcercode 8 days ago
  
  very cool. frequently, i want to share my prompt + session output; this will make that super easy! thanks again for sharing!
ilyakaminsky 8 days ago

I use Gemini CLI on a daily basis. It used to crash often and I'd lose the chat history. I found this tool called ai-cli-log [1] and it does something similar out of the box. I don't run Gemini CLI without it.
[1] https://github.com/alingse/ai-cli-log

echelon 8 days ago

The author went to great lengths about open source early on. I wonder if they'll cover the QwenEdit ecosystem.

I'm exceptionally excited about Chinese editing models. They're getting closer and closer to NanoBanana in terms of robustness, and they're open source. This means you can supply masks and kernels and do advanced image operations, integrate them into visual UIs, etc.

You can even fine tune them and create LoRAs that will do the style transferring tasks that Nano Banana falls flat on.

I don't like how closed the frontier US models are, and I hope the Chinese kick our asses.

That said, I love how easy it'll be to distill Nano Banana into a new model. You can pluck training data right out of it: ((any image, any instruction) -> completion) tuples.

vunderba 8 days ago
The Qwen-Edit images from my GenAI Image Editing Showdown site were all generated from a ComfyUI workflow on my machine - it's shockingly good for an open-weight model. It was also the only model that scored a passing grade on the Van Halen M&M test (even compared against Nanobanana)
https://genai-showdown.specr.net/image-editing
- irthomasthomas 7 days ago
  
  Ha I created a Van Halen M&M test for text prompts. I would include an instruction demanding that the response contain <yellow_m&m> and <red_m&m> but never <brown_m&m>. Then I would fail any llm that did not include any m&ms, or if they wrote anything about the <brown_m&m> in the final output.
msp26 8 days ago
> I don't like how closed the frontier US models are, and I hope the Chinese kick our asses.
For imagegen, agreed. But for textgen, Kimi K2 thinking is by far the best chat model at the moment from my experience so far. Not even "one of the best", the best.
It has frontier level capability and the model was made very tastefully: it's significantly less sycophantic and more willing to disagree in a productive, reasonable way rather than immediately shutting you out. It's also way more funny at shitposting.
I'll keep using Claude a lot for multimodality and artifacts but much of my usage has shifted to K2. Claude's sycophancy is particular is tiresome. I don't use ChatGPT/Gemini because they hide the raw thinking tokens, which is really cringe.
- astrange 8 days ago
  
  Claude Sonnet 4.5 doesn't even feel sycophantic (in the 4o) way, it feels like it has BPD. It switches from desperately agreeing with you to moralizing lectures and then has a breakdown if you point out it's wrong about anything.
  Also, yesterday I asked it a question and after the answer it complained about its poorly written system prompt to me.
  They're really torturing their poor models over there.
  
  1 reply →
minimaxir 8 days ago
I've been keeping an eye on Qwen-Edit/Wan 2.2 shenanigans and they are interesting: however actually running those types of models is too cumbersome and in the end unclear if it's actually worth it over the $0.04/image for Nano Banana.
- CamperBob2 8 days ago
  
  I was skeptical about the notion of running similar models locally as well, but the person who did this (https://old.reddit.com/r/StableDiffusion/comments/1osi1q0/wa... ) swears that they generated it locally, just letting a single 5090 crunch away for a week.
  If that's true, it seems worth getting past the 'cumbersome' aspects. This tech may not put Hollywood out of business, but it's clear that the process of filmmaking won't be recognizable in 10 years if amateurs can really do this in their basements today.
  
  1 reply →
- braebo 8 days ago
  
  Takes a couple mouse clicks in ComfyUI
  
  4 replies →

minimaxir 7 days ago

I just merged the PR and pushed 0.3.1 to PyPI. I also added README documentation and allowed for a `gemimg` entrypoint to the CLI via project.scripts as noted elsewhere in the thread.

ctippett 8 days ago

Any reason for not also adding a project.scripts entry for pyproject.toml? That way the CLI (great idea btw) could be installed as a tool by uv.

simonw 8 days ago

I decided to avoid that purely to keep changes made to the package as minima as possible - adding a project.scripts means installing it adds a new command alias. My approach changes nothing other than making "python -m gemimg" do something useful.
I agree that a project.scripts would be good but that's a decision for the maintainer to take on separately!