← Back to context

Comment by bel8

17 hours ago

It probably works similar to how Gemini works in Android for a while now.

You can point or select anywhere on the screen and it understands and searches the context. If you select a text block, even text inside an image, it allows to copy or search the text online. Otherwise it can search the image.

I use it often. It's intuitive and fast even on non-flagship phones.

I'd wager their A/B tests went well enough to warrant a port from phones to their new "Chromebook".

Their video is completely different from what Gemini does now. It analyses mouse movements, like circling around things, underlining things with the mouse, pointing at things to indicate where they need to go. It's a lot like the interfaces you might see in sci-fi movies, where generic gestures are understood within context in a way that modern computers can't handle.

  • > circling around things, underlining things with the mouse

    Do we use the same Android Gemini assistant?

    Because the one I use does that and it has object detection smart enough to be intuitive. It usually gets it right when I point something on the screen. And when it doesn't, I can circle around the thing or just click again.

    This Instagram post for example, it automatically highlighted the entire person, but I wanted to know about the shoes. I then clicked once on the shoes and it knew exactly what I wanted and gave me the info in about 2 seconds:

    https://imgur.com/a/lHUeciy

    This is useful to non tech savvy folks. Not just to us hackers.

    • Google's Gemini features differ per region to a massive extent. There's a good chance privacy laws prevent Google from providing me with the same Gemini you use.

      Object detection is mediocre at best. Circling things and using their AI editing features works, but the artefacts confuse Lens and other image parsing systems. Extracting objects from images usually mostly works, but it's not on par with what Apple had long before Google built it.

      The difference remains that the Gemini app on Android requires activation. You cannot tap a button or click a link while you're on the Gemini screen.

      The video isn't on the linked page anymore, but it's here: https://deepmind.google/blog/ai-pointer/ and here: https://www.youtube.com/watch?v=pZNzfQLgGsA

      It's an absolute privacy nightmare for most people, but if we ever get enough RAM and compute to run this stuff locally, I think this can actually make a new paradigm for user interaction, something with lisp machine self-customisability but for people who don't know anything about computers.

      And if it doesn't work, it'll be the most horrific, messy, useless UI humanity has ever invented, and we all get a new funny meme to laugh about Google. Win-win!