Comment by phire

2 years ago

I'm ok with "edited for latency" or "only showing the golden path".

But the most impressive part of the demo, was the way the LLM just seemed to know when to jump in with a response. It appeared to be able to wait until the user had finished the drawing, or even jumping in slightly before the drawing finished. At one point the LLM was halfway though a response and then saw the user was now colouring the duck in blue, and started talking about how the duck appearing to be blue.

The LLM also appeared to know when a response wasn't needed because the user was just agreeing with the LLM.

I'm not sure how many people noticed that on a conscious level, but I positive everyone noticed it subconsciously, and felt the interaction was much more natural.

As you said, good speed-to-text and speech-to-text has already been done, along with multi-model image/video/audio LLMs and image/music generation. The only novel thing google appeared to be demonstrating and what was most impressive was this apparent natural interaction. But that part was all fake.

0 comments

phire

No comments yet

Contribute on Hacker News ↗