Comment by fidotron

11 hours ago

> WebRTC is designed to degrade and drop my prompt during poor network conditions

You want real time that's what you are going to deal with. If you don't want real time and instead imagine everything as STT -> Prompt -> TTS then maybe you shouldn't even be sending audio on the wire at all.

Hello Mr Author here. Apologies that my comment replies aren't as funny.

Every low-latency application has to decide the user experience trade-off between quality and latency. Congestion causes queuing (aka latency) and to avoid that, something needs to be skipped (lower quality).

The WebRTC latency vs. quality knob is fixed. It's great at minimizing latency, but suffers from a lack of flexibility. We still (try to) use WebRTC anyway, because like you implied, browser support has made it one of the only options.

Until now of course! WebTransport means you can achieve WebRTC-like behavior via a generic protocol. Choose how long you want to wait before dropping/resetting a stream, instead of that decision being made for you.

And yeah my point in the blog is that often the user wants streaming, but not dropping. Obviously you can stream audio input/output without WebRTC. The application should be able to decide when audio packets are lost forever... is it 50ms or 500ms or 5000ms? My argument is that voice AI shouldn't pick the 50ms option.

Yep. Maybe there's some additional configuration I'm missing to mitigate the delay but clients don't seem to want to deal with the delay with STT -> Prompt -> TTS. They'll happily suffer occasional quality issues if the conversation feels "real".

  • >Yep. Maybe there's some [dropped] issues if the conversation feels "real".

    Can you repeat that please? It didn't make any sense. This conversation doesn't feel "real".