Comment by skylerwiernik

7 months ago

Cool idea, but just watching your demo it looks like it doesn’t work. Is there any change in the video? The lip movements certainly don’t look synchronized, and audio often continues after the person stops talking. It also doesn’t do any audio mimicking. It really doesn’t look like it does anything that Google Translate doesn’t.

Appreciate the feedback. On the video side, we currently synchronize it to play out with the translated audio (as often as possible), matching when you started speaking to the moment the translated audio starts. Mentioned in another comment but we're still working on audio mimicking (voice clone then inflection transfer). Our model does a lot that Google Translate doesn't, even just around translation, such as taking into account who you're talking to in the meeting and the conversation context. + we have to do it much faster, so smaller audio chunks at a time!