Comment by toddmorey

2 months ago

Confident idiot: I’m exploring using LLM for diagram creation.

I’ve found after about 3 prompts to edit an image with Gemini, it will respond randomly with an entirely new image. Another quirk is it will respond “here’s the image with those edits” with no edits made. It’s like a toaster that will catch on fire every eighth or ninth time.

I am not sure how to mitigate this behavior. I think maybe an LLM as a judge step with vision to evaluate the output before passing it on to the poor user.

8 comments

toddmorey

codazoda 2 months ago

I had a similar result trying to create 16 similarly styled images. After half a dozen it just started kicking out the same image over and over again no matter what the prompt said. Even the “thinking” looked right, but the image was just a repeat. I don’t know if this is some type of context limitation or what.

I got around it by using a new prompt/context for each image. This required some rethinking about how to make them match. What I did was create a sprite sheet with the first prompt and then only replaced (edited) the second prompt.

I still got some consistency problems because there were a few important details left out of my sprite sheet. Next time I think I’ll create those individually and then attach them as context for additional prompts.

toddmorey 2 months ago

Oh smart. This is good guidance. Yeah fascinating how longer running context causes these side effects, especially the repeated image with no changes bug.

RationPhantoms 2 months ago

Whats your thoughts on the diagram as code movement? I'd prefer to have an LLM utilize those as it can atleast drive some determinism through it rather than deal with the slippery layer that is prompt control for visual LLMs.

toddmorey 2 months ago

I think that's the right approach and what I've been experimenting with. Diagram as code and then style transfer from output diagram to desired look. That's where I've had the most success.

codingdave 2 months ago

Have you considered that perhaps such things simply are not within its capabilities?

toddmorey 2 months ago

I mean, one of its flagship features is to make precise edits to images. And it's really good at it... until it randomly isn't.

user34283 2 months ago

Yes, same here.

I don't know if it's a fault with the model or just a bug in the Gemini app.

dominotw 2 months ago

same. i gave it a very well hand drawn floor plan but never seems to be able to create a formal version of it. Its very very simple too.

makes hilarious mistakes like putting toilet right in the middle of living room.

I dont get all the hype. am i stupid.