Comment by dfbrown
2 years ago
How real is it though? This blog post says
In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.
which makes it sound like they used text + image prompts and then acted them out in the video, as opposed to Gemini interpreting the video directly.
https://developers.googleblog.com/2023/12/how-its-made-gemin...
After reading this blog post, that hands-on video is just straight-up lying to people. For the boxcar example, the narrator in the video says to Gemini:
> Narrator: "Based on their design, which of these would go faster?"
Without even specifying that those are cars! That was impressive to me, that it recognized the cars are going downhill _and_ could infer that in such a situation, aerodynamics matters. But the blog post says the real prompt was this:
> Real Prompt: "Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details."
They narrated inaccurate prompts for the Sun/Saturn/Earth example too:
> Narrator: "Is this the right order?"
> Real Prompt: "Is this the right order? Consider the distance from the sun and explain your reasoning."
If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!
Out of curiosity I've asked GPT-4V the same questions:
I'm actually pretty impressed how well it did with such basic prompts.
What do you mean "Real Prompt"? Nowhere does it say these are the real prompts, it says
> In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.
Not "here are the full prompts used in the video" or something like that.
None of the entries match up 1:1. And the response to the car example in the video doesn't even make sense in response to the prompt in the post (no mention of speed), and certainly isn't a trimmed portion of the response in the post.
The video has the disclaimer "For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity". It would be weird to write that but not mention that neither the prompts nor responses shared even the same set of words in the same order with the "Real" prompts and responses.
I think your assumption is wrong on this one.
Wow I was blown away when I watched this video.
Now that I learned how fake it is, that is more evidence that Google is in really bad shape with this.
>If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!
It's crazy that this is where we are now. This is obviously still crazy impressive even if hadn't done those edits.
It might still be crazy impressive, but none-the-less, going forward we now know that we cannot trust Google's videos about it, as they're heavily edited to look a lot more impressive than it is.
Those prompts aren't far off, but I still don't know how realistic the demo is. Until a product is in my hands, as far as I'm concerned it doesn't exist.
A lesson in how to commit securities fraud and get away with it.
Boo! Complete marketing garbage. May as well have been a Flash demo.
Yeah I think this comment basically sums up my cynicism about that video.
It's that, you know some of this happened and you don't know how much. So when it says "what the quack!" presumably the model was prompted "give me answers in a more fun conversational style" (since that's not the style in any of the other clips) and, like, was it able to do that with just a little hint or did it take a large amount of wrangling "hey can you say that again in a more conversational way, what if you said something funny at the beginning like 'what the quack'" and then it's totally unimpressive. I'm not saying that's what happened, I'm saying "because we know we're only seeing a very fragmentary transcript I have no way to distinguish between the really impressive version and the really unimpressive one."
It'll be interesting to use it more as it gets more generally available though.
You can see the cracks in the feature early on:
"What do you think I'm doing? Hint: it's a game."
Anyone with as much "knowledge" as Gemini aught to know it's roshambo.
"Is this the right order? Consider the distance from the sun and explain your reasoning."
Full prompt elided from the video.
I’ve heard of roshambo mostly from South Park
https://www.urbandictionary.com/define.php?term=roshambo
I’ve vaguely heard the term before, but I don’t know what regions of the world actually use that term.
Never heard it called that. Curious where you are from?
1 reply →
It's always like this isn't it. I was watching the demo and thought why ask it what duck is in multiple languages? Siri can do that right now and it's not an ai model. I really do think we're getting their with the ai revolution but these demos are so far from exciting, they're just mundane dummy tasks that don't have the nuance of everything we really interact and would need help from an ai with
How do you know though? The responses in the video were not the same as those in the blog post.