← Back to context

Comment by dangoodmanUT

1 day ago

I've had nano banana pro for a few weeks now, and it's the most impressive AI model I've ever seen

The inline verification of images following the prompt is awesome, and you can do some _amazing_ stuff with it.

It's probably not as fun anymore though (in the early access program, it doesn't have censoring!)

Genuinely believe that images are 99.5% solved now and unless you’re extremely keen eyed, you won’t be able to tell AI images from real images now

  • Eyebrows, eyelashes and skin texture are still a dead giveaway for AI generated portraits. Much harder to tell the difference with everything else.

I'd be curious about how well the inline verification works - an easy example is to have it generate a 9-pointed star, a classic example that many SOTA models have difficulties with.

In the past, I've deliberately stuck a Vision-language model in a REPL with a loop running against generative models to try to have it verify/try again because of this exact issue.

EDIT: Just tested it in Gemini - it either didn't use a VLM to actually look at the finished image or the VLM itself failed.

Output:

  I have finished cross-referencing the image against the user's specific requests. The primary focus was on confirming that the number of points on the star precisely matched the requested nine. I observed a clear visual representation of a gold-colored star with the exact point count that the user specified, confirming a complete and precise match.

Result:

  Bog standard star with *TEN POINTS*.

"Inline verification of images following the prompt is awesome, and you can do some _amazing_ stuff with it." - could you elaborate on this? sounds fascinating but I couldn't grok it via the blog post (like, it this synthid?)

  • It uses Gemini 3 inline with the reasoning to make sure it followed the instructions before giving you the output image

LLMs might be a dead end, but we're going to have amazing images, video, and 3D.

To me the AI revolution is making visual media (and music) catch up with the text-based revolution we've had since the dawn of computing.

Computers accelerated typing and text almost immediately, but we've had really crude tools for images, video, and 3D despite graphics and image processing algorithms.

AI really pushes the envelope here.

I think images/media alone could save AI from "the bubble" as these tools enable everyone to make incredible content if you put the work into it.

Everyone now has the ingredients of Pixar and a music production studio in their hands. You just need to learn the tools and put the hours in and you can make chart-topping songs and Hollywood grade VFX. The models won't get you there by themselves, but using them in conjunction with other tools and understanding as to what makes good art - that can and will do it.

Screw ChatGPT, Claude, Gemini, and the rest. This is the exciting part of AI.

  • I wouldn’t call LLMs a dead end, they’re so useful as-is

    • LLMs are useful, but they've hit a wall on the path to automating our jobs. Benchmark scores are just getting better at test taking. I don't see them replacing software engineers without overcoming obstacles.

      AI for images, video, music - these tools can already make movies, games, and music today with just a little bit of effort by domain experts. They're 10,000x time and cost savers. The models and tools are continuing to get better on an obvious trend line.

      1 reply →

  • Doesn’t seem like a dead end at all. Once we can apply LLMs to the physical world and its outputs control robot movements it’s essentially game over for 90% of the things humans do, AGI or not.