Comment by jedberg

11 hours ago

I agree with everything you've said, I must have written it wrong.

What I was saying is the same as you -- the user will tolerate a total delay of 500ms, and then happiness starts to fall off. We had some Alexa utterances at 500ms, the most basic ones, but most took longer.

However, even with http2 and the like, we could get in that range because of the fact that it was sending data right away, so we were mostly done processing the STT by the time they were done speaking, and we were already working on the answer based on the first part of the utterance.

But I would need to see some really strong evidence to even think about using WebRTC.

1 comment

jedberg

aenis 30 minutes ago

Sorry, I misunderstood your comment.

As for webrtc - it was mainly for decent support in browsers and built in AEC. I think we will take another look at this design choice if we run out of ways to further optimize.