Comment by Olreich

1 year ago

Almost any gap in audio is detectable and sounds really bad. 40ms is a lot, but sending 40ms of silence is probably worse

Sounds bad to whom? I’m talking about the direction from user to AI, not the direction from AI to user. If some of the audio gets delayed on the way to the AI, the AI can be paused. If some of the audio gets delayed on the way to a human, the human can’t be paused, so some buffering is needed to reduce the risk of gaps.