Comment by fidotron

11 hours ago

> WebRTC is designed to degrade and drop my prompt during poor network conditions

You want real time that's what you are going to deal with. If you don't want real time and instead imagine everything as STT -> Prompt -> TTS then maybe you shouldn't even be sending audio on the wire at all.

6 comments

fidotron

kixelated 10 hours ago

Hello Mr Author here. Apologies that my comment replies aren't as funny.

Every low-latency application has to decide the user experience trade-off between quality and latency. Congestion causes queuing (aka latency) and to avoid that, something needs to be skipped (lower quality).

The WebRTC latency vs. quality knob is fixed. It's great at minimizing latency, but suffers from a lack of flexibility. We still (try to) use WebRTC anyway, because like you implied, browser support has made it one of the only options.

Until now of course! WebTransport means you can achieve WebRTC-like behavior via a generic protocol. Choose how long you want to wait before dropping/resetting a stream, instead of that decision being made for you.

And yeah my point in the blog is that often the user wants streaming, but not dropping. Obviously you can stream audio input/output without WebRTC. The application should be able to decide when audio packets are lost forever... is it 50ms or 500ms or 5000ms? My argument is that voice AI shouldn't pick the 50ms option.

vince14 5 hours ago

Isn't the jitterBufferTarget [0] the latency vs. quality knob?
[0] https://developer.mozilla.org/en-US/docs/Web/API/RTCRtpRecei...

cowsandmilk 11 hours ago

> You want real time

Isn’t the point that OpenAI’s use case does not require realtime?

When OpenAI responds, it has most of the audio in advance of when the user needs to hear it. It produces audio faster than real time, so a real time protocol is a bad fit.

Sean-Der 11 hours ago

That is not the case. See get-realtime-translate[0 that's doing it as a trickle instead (not turn based).
[0] https://developers.openai.com/api/docs/models/gpt-realtime-t...

telman17 11 hours ago

Yep. Maybe there's some additional configuration I'm missing to mitigate the delay but clients don't seem to want to deal with the delay with STT -> Prompt -> TTS. They'll happily suffer occasional quality issues if the conversation feels "real".

DonHopkins 8 hours ago

>Yep. Maybe there's some [dropped] issues if the conversation feels "real".
Can you repeat that please? It didn't make any sense. This conversation doesn't feel "real".