Comment by sinuhe69

2 years ago

I, too, was fooled to think Gemini has seen and heard through a video/audio feed instead of showing still images and prompting though text. While it might seem not much difference between still images and a video feed, in fact it requires a lot of (changing) context understanding to not make the bot babbling like an idiot all the time. It also requires the bot to recognize the “I don’t know it yet” state to keep appropriate silence in a conversation with live video feed, which is notoriously difficult with generative AI. Certainly one can do some hacking, build in some heuristics to make it easier, but to make a bot seems like a human partner in a conversion is indeed very hard. And that has been the most impressive aspect of the showed “conversations”, which are unfortunately all faked :(

0 comments

sinuhe69

No comments yet

Contribute on Hacker News ↗