Comment by fariszr

8 months ago

This is the gpt 4 moment for image editing models. Nano banana aka gemini 2.5 flash is insanely good. It made a 171 elo point jump in lmarena!

Just search nano banana on Twitter to see the crazy results. An example. https://x.com/D_studioproject/status/1958019251178267111

83 comments

fariszr

qingcharles 8 months ago

I've been testing it for several weeks. It can produce results that are truly epic, but it's still a case of rerolling the prompt a dozen times to get an image you can use. It's not God. It's definitely an enormous step though, and totally SOTA.

spaceman_2020 8 months ago
If you compare to the amount of effort required in Photoshop to achieve the same results, still a vast improvement
- qingcharles 8 months ago
  
  I work in Photoshop all day, and I 100% agree. Also, I just retried a task that wouldn't work last night on nano-banana and it worked first time on the released model, so I'm wondering if there were some changes to the released version?
  
  3 replies →
- echelon 8 months ago
  
  Vibe coding might not be real, but vibe graphics design certainly is.
  https://imgur.com/a/internet-DWzJ26B
  Anyone can make images and video now.
  
  17 replies →
- petralithic 8 months ago
  
  Why would you compare it to Photoshop? If you compare it to other tools in the same category, of image generation, you will find models like Flux and Qwen do much better.
vitorgrs 8 months ago

The model seems good, but it seems to have huge issues in doing garbage most of times lol.
Still needs more RLHF tuning I guess? As the previous version was even worse.
druskacik 8 months ago
Is it because the model is not good enough at following the prompt, or because the prompt is unclear?
Something similar has been the case with text models. People write vague instructions and are dissatisfied when the model does not correctly guess their intentions. With image models it's even harder for model to guess it right without enough details.
- toddmorey 8 months ago
  
  Remember in image editing, the source image itself is a huge part of the prompt, and that's often the source of the ambiguity. The model may clearly understand your prompt to change the color of a shirt, but struggle to understand the boundaries of the shirt. I was just struggling to use AI to edit an image where the model really wanted the hat in the image to be the hair of the person wearing it. My guess for that bias is that it had just been trained on more faces without hats than with them on.
- qingcharles 8 months ago
  
  No, my prompts are very, very clear. It just won't follow them sometimes. Also this model seems to prefer shorter prompts, in my experience.
ericlang 8 months ago
How did you get early access? Thanks.
- Thorrez 8 months ago
  
  I believe lmarena.
animanoir 8 months ago

[dead]

hapticmonkey 8 months ago

Before AI, people complained that Google was taking world class engineering talent and using it for little more than selling people ads.

But look at that example. With this new frontier of AI, that world class engineering talent can finally be put to use…for product placement. We’ve come so far.

vineyardmike 8 months ago

> finally be put to use…for product placement.
Did you think that Google would just casually allow their business to be disrupted without using the technology to improve the business and also protecting their revenue?
Both Meta and Google have indicated that they see Generative AI as a way to vertically integrate within the ad space, disrupting marketing teams, copyrighters, and other jobs who monitor or improve ad performance.
Also FWIW, I would suspect that the majority of Google engineers don't work on an ad system, and probably don't even work on a profitable product line.
johnfn 8 months ago

Oh come on - you have this incredible technology at your disposal and all you can think to use it for is product placement?
torginus 8 months ago

I am pretty sure a lot of said engineering talent isn't actually contributing to AI but doing other stuff

torginus 8 months ago

Another nitpick - the pink puffer jacket that got edited into the picture is not the same as the one in the reference image - it's very similar but if I were to use this model for product placement, or cared about these sort of details, I'd definitely have issues with this.

drmath 8 months ago
Even in the just-photoshop-not-ai days product photos had become pretty unreliable as a means of understanding what you're buying. Of course it's much worse now.
- ethbr1 8 months ago
  
  Note: Please understand that monitor may color different. If image does not match product received then kindly your monitor calibration. Seller not responsible. /ebay&amazon
  
  1 reply →

dcre 8 months ago

Alarming hands on the third one: it can't decide which way they're facing. But Gemini didn't introduce that, it's there in the base image.

725686 8 months ago
Yes, the base image's hands are creepy.
- meatmanek 8 months ago
  
  I noticed the AI pattern on the sunglasses first. I guess all of the source images are AI-generated? In a sense, that makes the result slightly less impressive -- is it going to be as faithful to the original image when the input isn't already a highly likely output for an AI model? Were the input images generated with the same model that's being used to manipulate them?
  
  1 reply →

ceroxylon 8 months ago

It seems like every combination of "nano banana" is registered as a domain with their own unique UI for image generation... are these all middle actors playing credit arbitrage using a popular model name?

bonoboTP 8 months ago
I'd assume they are just fake, take your money and use a different model under the hood. Because they already existed before the public release. I doubt that their backend rolled the dice on LMArena until nano-banana popped up. And that was the only way to use it until today.
- ceroxylon 8 months ago
  
  Agreed, I didn't mean to imply that they were even attempting to run the actual nano banana, even through LMarena.
  There is a whole spectrum of potential sketchiness to explore with these, since I see a few "sign in with Google" buttons that remind me of phishing landing pages.
vunderba 8 months ago

They're almost all scams. Nano banana AI image generator sites were showing up when this model was still only available in LM Arena.

93po 8 months ago

Completely agree - I make logos for my github projects for fun, and the last time I tried SOTA image generation for logos, it was consistently ignoring instructions and not doing anything close to what i was asking for. Google's new release today did it near flawlessly, exactly how I wanted it, in a single prompt. A couple more prompts for tweaking (centering it, rotating it slightly) got it perfect. This is awesome.

ivape 8 months ago

Regardless, it seems Google is on the frontier of every type of model and robotics (cars). It’s nutty how we forget what a intellectual juggernaut they are.

fariszr 8 months ago

Tool use and sycophancy are still big issues in gemini 2.5 models.

summerlight 8 months ago

I wonder how the creative workflow looks like when this kind of models are natively integrated into digital image tools. Imagine fine-grained controls on each layer and their composition with the semantic understanding on the full picture.

koakuma-chan 8 months ago

Why is it called nano banana?

ehsankia 8 months ago
Before a model is announced, they use codenames on the arenas. If you look online, you can see people posting about new secret models and people trying to guess whose model it is.
- mvdtnz 8 months ago
  
  What are "the arenas"?
  
  4 replies →
Jensson 8 months ago

Engineers often have silly project names internally, then some marketing team rewrites the name for public release.
ZephyrBlu 8 months ago
I'm pretty sure it's because an image of a banana under a microscope generated by the model went super viral
- polynomial 8 months ago
  
  Or was that just marketing?

rplnt 8 months ago

Oh no, even more mis-scaled product images.

torginus 8 months ago

No, it's not really that much of an improvement. Once you start coming up with specific tasks, it fails just like the others.

littlestymaar 8 months ago

> An example. https://x.com/D_studioproject/status/1958019251178267111

“Nano banana” is probably good, given its score on the leaderboard, but the examples you show don't seem particularly impressive, it looks like what Flux Kontext or Qwen Image do well already.

polishdude20 8 months ago

The fingernails on one of them. Ohhh nooo

ethbr1 8 months ago

Image genai made me realize just how inattentive to detail a lot of people are.

goosejuice 8 months ago

Yet it's failed spectacularly at almost everything I've given it.

r33b33 8 months ago

nano banana is good, but not insanely good

Viaya 8 months ago

[dead]

mooncakes_ooohh 8 months ago

Be gone scammer

Viaya 8 months ago

[dead]

fHr 8 months ago

cope

echelon 8 months ago

> This is the gpt 4 moment for image editing models.

No it's not.

We've had rich editing capabilities since gpt-image-1, this is just faster and looks better than the (endearingly? called) "piss filter".

Flux Kontext, SeedEdit, and Qwen Edit are all also image editing models that are robustly capable. Qwen Edit especially.

Flux Kontext and Qwen are also possible to fine tune and run locally.

Qwen (and its video gen sister Wan) are also Apache licensed. It's hard not to cheer Alibaba on given how open they are compared to their competitors.

We've left the days of Dall-E, Stable Diffusion, and Midjourney of "prompt-only" text to image generation.

It's also looking like tools like ComfyUI are less and less necessary as those capabilities are moving into the model layer itself.

raincole 8 months ago
In other words, this is the gpt 4 moment for image editing models.
Gpt4 isn't "fundamentally different" from gpt3.5. It's just better. That's the exact point the parent commenter was trying to make.
- jug 8 months ago
  
  I'd say it's more like comparing Sonnet 3.5 to Sonnet 4. GPT-4 was a rather fundamental improvement. It jumped to professional applications compared to the only causal use you could use ChatGPT 3.5 for.
- retinaros 8 months ago
  
  did you see the generated pic demis posted on X? it looks like slop from 2 years ago. https://x.com/demishassabis/status/1960355658059891018
  
  2 replies →
krackers 8 months ago
I'm confused as well, I thought gpt-image could already do most of these things, but I guess the key difference is that gpt-image is not good for single point edits. In terms of "wow" factor it doesn't feel as big as gpt 3->4 though, since it sure _felt_ like models could already do this.
- echelon 8 months ago
  
  People really slept on gpt-image-1 and were too busy making Miyazaki/Ghibli images.
  I feel like most of the people on HN are paying attention to LLMs and missing out on all the crazy stuff happening with images and videos.
  LLMs might be a bubble, but images and video are not. We're going to have entire world simulation in a few years.
fariszr 8 months ago

I'm sorry I absolutely don't agree. This model is on a whole other level.
It's not even close. https://twitter.com/fareszr/status/1960436757822103721
bsenftner 8 months ago

I'm totally with you. Dismayed by all these fanbois.