Comment by vunderba

8 days ago

> I also created a small editing suite for myself where I can draw bounding boxes on images when they aren’t perfect, and have them fixed. Either just with a prompt or feeding them to Claude as image and then having it write the prompt to fix the issue for me (as a workflow on the api)

Are you talking about Automatic1111 / ComfyUI inpainting masks? Because Nano doesn't accept bounding boxes as part of its API unless you just stuffed the literal X/Y coordinates into the raw prompt.

You could do something where you draw a bounding box and when you get the response back from Nano, you could mask that section back back over the original image - using a decent upscaler as necessary in the event that Nano had to reduce the size of the original image down to ~1MP.

7 comments

vunderba

Genego 7 days ago

No I am using my own workflows and software for this. I made nano-banana accept my bounding boxes. Everything is possible with some good prompting: https://edwin.genego.io/blog/lpa-studio < there are some videos of an earlier version there while I am editing a story. Either send the coords and describe the location well, or draw a box around the bb and tell it to return the image without the drawn bb, and only the requested changes.

It also works well if you draw a bb on the original image, then ask Claude for a meta-prompt to deconstruct the changes into a much more detailed prompt, and then send the original image without the bbs for changes. It really depends on the changes you need, and how long you're willing to wait.

- normal image editing response: 12-14s

- image editing response with Claude meta-prompting: 20-25s

- image editing response with Claude meta-prompting as well as image deconstructing and re-constructing the prompt: 40-60s

(I use Replicate though, so the actual API may be much faster).

This way you can also go into new views of a scene by zooming in and out the image on the same aspect-ratio canvas, and asking it to generatively fill the white borders around. So you can go from an tight inside shot, to viewing the same scene from outside of an house window. Or from inside the car, to outside the car.

vunderba 7 days ago
Thanks, that makes sense. I'll have to give the "red bounding box overlay" a shot when there are a great deal of similar objects in the existing image.
I also have a custom pipeline/software that takes in a given prompt, rewrites it using an LLM into multiple variations, sends it to multiple GenAI models, and then uses a VLM to evaluate them for accuracy. It runs in an automated REPL style, so I can be relatively hands-off, though I do have a "max loop limiter" since I'd rather not spend the equivalent of a small country's GDP.
- Genego 7 days ago
  
  Automated generator-critique loops for evaluation may be really useful for creating your own style libraries, because its easy for an LLM-agent to evaluate how close an image is from a reference style or scene. So you end up with a series of base prompts, and now can replicate that style across a whole franchise of stories. Most people still do it with reference images, and it doesn't really create very stable results. If you do need some help with bounding boxes for nano-banana, feel free to send me a message!
threecheese 6 days ago
What framework are you using to generate your documentation? It looks amazing.
- Genego 6 days ago
  
  I am using Django, HTML (JS - AlpineJS & HTMX). Each page is just created from scratch rather than from some CMS or template, I use Claude code for that (with mem0.ai as MCP) and build my entire development workspace and workflow around / into my website.

rcarr 8 days ago

You can literally just open the image up in Preview or whatever and add a red box, circle etc and then say "in the area with the red square make change foo" and it will normally get rid of the red box on the generated image. Whether or not it actually makes the change you want to see is another matter though. It's been very hit or miss for me.

vunderba 7 days ago

Yeah I could see that being useful if there were a lot of similar elements in the same image.
I also had similar mixed results wrt Nano-banana especially around asking it to “fix/restore” things (a character’s hand was an anatomical mess for example)