Comment by kweingar

2 years ago

The video itself and the video description give a disclaimer to this effect. Agreed that some will walk away with an incorrect view of how Gemini functions, though.

Hopefully realtime interaction will be part of an app soon. Doesn’t seem like there would be too many technical hurdles there.

The entirety of the disclaimer is "sequences shortened throughout", in tiny text at the bottom for two seconds.

They do disclose most of the details elsewhere, but the video itself is produced and edited in such a way that it's extremely misleading. They really want you to think that it's responding in complex ways to simple voice prompts and a video feed, and it's just not.

  • Yea, of all the edits in the video, the editing for timing is the least of concern. My gripe is that the prompting was different and in order to get that information you have to watch the video only on YouTube, expand the description and click on a link to a different blog article. Linking a "making of" video where they show this and interview some of the minds behind Gemini would have been better PR.

The disclaimer in the description is "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity."

That's different from "Gemini was shown selected still images and not video".

  • What I found impressive about it was the voice, the fast real-time response to video, and the succinct responses. So apparently all of that was fake. You got me, Google.

People don't really pay attention to disclaimers. Google made a choice knowing people would remember the hype, not the disclaimer.

  • I remember watching it and I was pretty impressed, but as I was walking around thinking to myself I came to the conclusion that there was something fishy about the demo. I didn't know exactly what they fudged, but it was far too polished to explain how well their current AI demos preform.

    I'm not saying there have been no improvements in AI. There is and this includes Google. But the reason why ChatGPT has really taken over the world is that the demo is in your own hands and it does quite well there.

    • Indeed, and this is how Google used to be as a company. I remember when Google Maps & Earth launched, and how they felt like world-changing technology. I'm sure they're doing lots of innovative science and development still, but it's and advertising/services company now, and one that increasingly talks down to its users. Disappointing considering their early sense of mission.

      Thinking back to the firm's early days, it strikes me that some HN users and perhaps even some Googlers have no memory of a time before Google Maps and simply can't imagine how disruptive and innovative things like that were at the time. Being able to browse satellite imagery for the whole world was something previously confined to the upper echelons of the military-industrial complex.

      That's one reason I wish the firm (along with several other tech giants) were broken up; it's full of talented innovative people, but the advertising economics at the core of their business model warp everything else.

  •     :%s/Google/the team
        :%s/people/the promotion board
    

    Conway's law applied to the corporate-public interface :)

No. The disclaimer was not nearly enough.

The video fooled many people, including myself. This was not your typical super optimized and scripted demo.

This was blatant false advertising. Showing capabilities that do not exist. It’s shameful behavior from Google, to be perfectly honest.

Yeah, and ads on Google search have the teeniest, tiniest little "ad" chip on them, a long progression of making ads more in-your-face and less well-distinguished.

In my estimation, given the context around AI-generated content and general fakery, this video was deceptive. The only impressive thing about the video (to me) was how snappy and fluid it seemed to be, presumably processing video in real time. None of that was real. It's borderline fraudulent.

They were just parroting this video on CNBC without any disclaimers, so the viewers who don't happen to also read hacker news will likely form a different opinion than those of us who do.

The difference between “Hey, figure out a game based on what you see right now” vs “here is a description of a game with the only too possible outcomes as examples” cannot be explained by the disclaimer.

performance and cost are hurdles?

  • It can be realtime while still having more latency than depicted in the video (and the video clearly stated that Gemini does not respond that quickly).

    A local model could send relevant still images from the camera feed to Gemini, along with the text transcript of the user’s speech. Then Gemini’s output could be read aloud with text-to-speech. Seems doable within the present cost and performance constraints.