← Back to context

Comment by tikkun

2 years ago

To add to my comment above: Google DeepMind put out 16 videos about Gemini today, the total watch time at 1x speed is about 45 mins. I've now watched them all (at >1x speed).

In my opinion, the best ones are:

* https://www.youtube.com/watch?v=UIZAiXYceBI - variety of video/sight capabilities

* https://www.youtube.com/watch?v=JPwU1FNhMOA - understanding direction of light and plants

* https://www.youtube.com/watch?v=D64QD7Swr3s - multimodal understanding of audio

* https://www.youtube.com/watch?v=v5tRc_5-8G4 - helping a user with complex requests and showing some of the 'thinking' it is doing about what context it does/doesn't have

* https://www.youtube.com/watch?v=sPiOP_CB54A - assessing the relevance of scientific papers and then extracting data from the papers

My current context: API user of OpenAI, regular user of ChatGPT Plus (GPT-4-Turbo, Dall E 3, and GPT-4V), occasional user of Claude Pro (much less since GPT-4-Turbo with longer context length), paying user of Midjourney.

Gemini Pro is available starting today in Bard. It's not clear to me how many of the super impressive results are from Ultra vs Pro.

Overall conclusion: Gemini Ultra looks very impressive. But - the timing is disappointing: Gemini Ultra looks like it won't be widely available until ~Feb/March 2024, or possibly later.

> As part of this process, we’ll make Gemini Ultra available to select customers, developers, partners and safety and responsibility experts for early experimentation and feedback before rolling it out to developers and enterprise customers early next year.

> Early next year, we’ll also launch Bard Advanced, a new, cutting-edge AI experience that gives you access to our best models and capabilities, starting with Gemini Ultra.

I hope that there will be a product available sooner than that without a crazy waitlist for both Bard Advanced, and Gemini Ultra API. Also fingers crossed that they have good data privacy for API usage, like OpenAI does (i.e. data isn't used to train their models when it's via API/playground requests).

My general conclusion: Gemini Ultra > GPT-4 > Gemini Pro

See Table 2 and Table 7 https://storage.googleapis.com/deepmind-media/gemini/gemini_... (I think they're comparing against original GPT-4 rather than GPT-4-Turbo, but it's not entirely clear)

What they've released today: Gemini Pro is in Bard today. Gemini Pro will be coming to API soon (Dec 13?). Gemini Ultra will be available via Bard and API "early next year"

Therefore, as of Dec 6 2023:

SOTA API = GPT-4, still.

SOTA Chat assistant = ChatGPT Plus, still, for everything except video, where Bard has capabilities . ChatGPT plus is closely followed by Claude. (But, I tried asking Bard a question about a youtube video today, and it told me "I'm sorry, but I'm unable to access this YouTube content. This is possible for a number of reasons, but the most common are: the content isn't a valid YouTube link, potentially unsafe content, or the content does not have a captions file that I can read.")

SOTA API after Gemini Ultra is out in ~Q1 2024 = Gemini Ultra, if OpenAI/Anthropic haven't released a new model by then

SOTA Chat assistant after Bard Advanced is out in ~Q1 2024 = Bard Advanced, probably, assuming that OpenAI/Anthropic haven't released new models by then

Watching these videos made me remember this cool demo Google did years ago where their earpods would auto translate in realtime a conversation between two people talking different languages. Turned out to be demo vaporware. Will this be the same thing?

  • Aren't you talking about this? https://support.google.com/googlepixelbuds/answer/7573100?hl... (which exists?)

    • I think they're getting at the idea that it was demoed as a real time babelfish, where a conversation simple happened between two people wearing the devices. Instead it was a glorified spoken dropdown selector for choosing the language, and a press and hold mechanism that just tied into the existing phone app without any actual changes or upgrades to that already available translation mechanism. The thought was that you'd simply start talking to each other and hear the other in your language as you go - not speak a block all at once, stop, translate, play back from your phone to them, stop, let them speak a whole reply at once while the phone listens to them, stop, translate, hear their response in your earpiece. Which basically meant the device itself didn't bring much if anything to the table that couldn't be done with any other headphones and doing the language select and start/stop recording on the phone itself.

  • I also get this feeling. The demo videos feel heavily edited and fabricated rather than actual demos.

When I watch any of these videos, all the related videos on my right sidebar are from Google, 16 of which were uploaded at the same time as the one I'm watching.

I've never seen the entire sidebar filled with the videos of a single channel before.

  • Yeah. Dropping that blatant a weight on the algorithm is the most infuriating dark patterns I've noticed in a while.

Wait so it doesn't exist yet? Thanks for watching 45 minutes of video to figure that out for me. Why am I wasting my time reading this thread?

Somebody please wake me up when I can talk to the thing by typing and dropping files into a chat box.