Comment by fariszr
8 months ago
This is the gpt 4 moment for image editing models. Nano banana aka gemini 2.5 flash is insanely good. It made a 171 elo point jump in lmarena!
Just search nano banana on Twitter to see the crazy results. An example. https://x.com/D_studioproject/status/1958019251178267111
I've been testing it for several weeks. It can produce results that are truly epic, but it's still a case of rerolling the prompt a dozen times to get an image you can use. It's not God. It's definitely an enormous step though, and totally SOTA.
If you compare to the amount of effort required in Photoshop to achieve the same results, still a vast improvement
I work in Photoshop all day, and I 100% agree. Also, I just retried a task that wouldn't work last night on nano-banana and it worked first time on the released model, so I'm wondering if there were some changes to the released version?
3 replies →
Vibe coding might not be real, but vibe graphics design certainly is.
https://imgur.com/a/internet-DWzJ26B
Anyone can make images and video now.
17 replies →
Why would you compare it to Photoshop? If you compare it to other tools in the same category, of image generation, you will find models like Flux and Qwen do much better.
The model seems good, but it seems to have huge issues in doing garbage most of times lol.
Still needs more RLHF tuning I guess? As the previous version was even worse.
Is it because the model is not good enough at following the prompt, or because the prompt is unclear?
Something similar has been the case with text models. People write vague instructions and are dissatisfied when the model does not correctly guess their intentions. With image models it's even harder for model to guess it right without enough details.
Remember in image editing, the source image itself is a huge part of the prompt, and that's often the source of the ambiguity. The model may clearly understand your prompt to change the color of a shirt, but struggle to understand the boundaries of the shirt. I was just struggling to use AI to edit an image where the model really wanted the hat in the image to be the hair of the person wearing it. My guess for that bias is that it had just been trained on more faces without hats than with them on.
No, my prompts are very, very clear. It just won't follow them sometimes. Also this model seems to prefer shorter prompts, in my experience.
How did you get early access? Thanks.
I believe lmarena.
[dead]
Before AI, people complained that Google was taking world class engineering talent and using it for little more than selling people ads.
But look at that example. With this new frontier of AI, that world class engineering talent can finally be put to use…for product placement. We’ve come so far.
> finally be put to use…for product placement.
Did you think that Google would just casually allow their business to be disrupted without using the technology to improve the business and also protecting their revenue?
Both Meta and Google have indicated that they see Generative AI as a way to vertically integrate within the ad space, disrupting marketing teams, copyrighters, and other jobs who monitor or improve ad performance.
Also FWIW, I would suspect that the majority of Google engineers don't work on an ad system, and probably don't even work on a profitable product line.
Oh come on - you have this incredible technology at your disposal and all you can think to use it for is product placement?
I am pretty sure a lot of said engineering talent isn't actually contributing to AI but doing other stuff
Another nitpick - the pink puffer jacket that got edited into the picture is not the same as the one in the reference image - it's very similar but if I were to use this model for product placement, or cared about these sort of details, I'd definitely have issues with this.
Even in the just-photoshop-not-ai days product photos had become pretty unreliable as a means of understanding what you're buying. Of course it's much worse now.
Note: Please understand that monitor may color different. If image does not match product received then kindly your monitor calibration. Seller not responsible. /ebay&amazon
1 reply →
Alarming hands on the third one: it can't decide which way they're facing. But Gemini didn't introduce that, it's there in the base image.
Yes, the base image's hands are creepy.
I noticed the AI pattern on the sunglasses first. I guess all of the source images are AI-generated? In a sense, that makes the result slightly less impressive -- is it going to be as faithful to the original image when the input isn't already a highly likely output for an AI model? Were the input images generated with the same model that's being used to manipulate them?
1 reply →
It seems like every combination of "nano banana" is registered as a domain with their own unique UI for image generation... are these all middle actors playing credit arbitrage using a popular model name?
I'd assume they are just fake, take your money and use a different model under the hood. Because they already existed before the public release. I doubt that their backend rolled the dice on LMArena until nano-banana popped up. And that was the only way to use it until today.
Agreed, I didn't mean to imply that they were even attempting to run the actual nano banana, even through LMarena.
There is a whole spectrum of potential sketchiness to explore with these, since I see a few "sign in with Google" buttons that remind me of phishing landing pages.
They're almost all scams. Nano banana AI image generator sites were showing up when this model was still only available in LM Arena.
Completely agree - I make logos for my github projects for fun, and the last time I tried SOTA image generation for logos, it was consistently ignoring instructions and not doing anything close to what i was asking for. Google's new release today did it near flawlessly, exactly how I wanted it, in a single prompt. A couple more prompts for tweaking (centering it, rotating it slightly) got it perfect. This is awesome.
Regardless, it seems Google is on the frontier of every type of model and robotics (cars). It’s nutty how we forget what a intellectual juggernaut they are.
Tool use and sycophancy are still big issues in gemini 2.5 models.
I wonder how the creative workflow looks like when this kind of models are natively integrated into digital image tools. Imagine fine-grained controls on each layer and their composition with the semantic understanding on the full picture.
Why is it called nano banana?
Before a model is announced, they use codenames on the arenas. If you look online, you can see people posting about new secret models and people trying to guess whose model it is.
What are "the arenas"?
4 replies →
Engineers often have silly project names internally, then some marketing team rewrites the name for public release.
I'm pretty sure it's because an image of a banana under a microscope generated by the model went super viral
Or was that just marketing?
Oh no, even more mis-scaled product images.
No, it's not really that much of an improvement. Once you start coming up with specific tasks, it fails just like the others.
> An example. https://x.com/D_studioproject/status/1958019251178267111
“Nano banana” is probably good, given its score on the leaderboard, but the examples you show don't seem particularly impressive, it looks like what Flux Kontext or Qwen Image do well already.
The fingernails on one of them. Ohhh nooo
Image genai made me realize just how inattentive to detail a lot of people are.
Yet it's failed spectacularly at almost everything I've given it.
nano banana is good, but not insanely good
[dead]
Be gone scammer
[dead]
cope
> This is the gpt 4 moment for image editing models.
No it's not.
We've had rich editing capabilities since gpt-image-1, this is just faster and looks better than the (endearingly? called) "piss filter".
Flux Kontext, SeedEdit, and Qwen Edit are all also image editing models that are robustly capable. Qwen Edit especially.
Flux Kontext and Qwen are also possible to fine tune and run locally.
Qwen (and its video gen sister Wan) are also Apache licensed. It's hard not to cheer Alibaba on given how open they are compared to their competitors.
We've left the days of Dall-E, Stable Diffusion, and Midjourney of "prompt-only" text to image generation.
It's also looking like tools like ComfyUI are less and less necessary as those capabilities are moving into the model layer itself.
In other words, this is the gpt 4 moment for image editing models.
Gpt4 isn't "fundamentally different" from gpt3.5. It's just better. That's the exact point the parent commenter was trying to make.
I'd say it's more like comparing Sonnet 3.5 to Sonnet 4. GPT-4 was a rather fundamental improvement. It jumped to professional applications compared to the only causal use you could use ChatGPT 3.5 for.
did you see the generated pic demis posted on X? it looks like slop from 2 years ago. https://x.com/demishassabis/status/1960355658059891018
2 replies →
I'm confused as well, I thought gpt-image could already do most of these things, but I guess the key difference is that gpt-image is not good for single point edits. In terms of "wow" factor it doesn't feel as big as gpt 3->4 though, since it sure _felt_ like models could already do this.
People really slept on gpt-image-1 and were too busy making Miyazaki/Ghibli images.
I feel like most of the people on HN are paying attention to LLMs and missing out on all the crazy stuff happening with images and videos.
LLMs might be a bubble, but images and video are not. We're going to have entire world simulation in a few years.
I'm sorry I absolutely don't agree. This model is on a whole other level.
It's not even close. https://twitter.com/fareszr/status/1960436757822103721
I'm totally with you. Dismayed by all these fanbois.