Comment by jprjr_
2 hours ago
There's a few cases where WebRTC falls apart that I think MoQ could help with.
It doesn't work so well for having a low-latency broadcast. Your choices right now are - use WebRTC and deploy selective forwarding units, which are going to be something custom, and likely involve spinning up a bunch of geographically-distributed virtual machines, figuring out signalling and whatnot. Or - use HLS so you can use more standard HTTP CDN tech, but you gain orders of magnitude of latency.
MoQ should allow for a standardized CDN stack, meaning we should be able to have a more abstract service (instead of spinning up VMs, you just employ some company's CDN service and tell it where to get media from).
There's a lot of other little issues with WebRTC for certain, specific applications. Like - last I tried it, browsers will subtly speed up audio/video to keep everything in sync, and you can have scenarios where you'd rather just let the viewer fall behind a bit and skip ahead later (say you're listening to music, speeding it up isn't ideal).
Or - say you want to have a group call and capture each participant's audio individually and edit it together later for something like a podcast. It's been a while since I've tried this, but I recall it being pretty difficult to do that with WebRTC. I remember all the mixing would happen in the browser's libwebrtc and I had really limited control over things.
> use WebRTC and deploy selective forwarding units, which are going to be something custom
Would you mind explaining more? If you are doing WHIP/WHEP you should be able to drop in Broadcast Box/MediaMTX etc... and switch out servers and no one should notice. You can use browser/mobile/ffmpeg/OBS etc... get the same behavior. I care a lot about the broadcast space, want to learn about other problems.
> subtly speed up audio/video to keep everything in sync
You can use https://webrtc.googlesource.com/src/+/refs/heads/main/docs/n... to add more delay (if you want to force more buffering). Or if you don't link the media together (via MediaStream) you don't get the behavior you describe either!
> capture each participant's audio individually
That's a neat problem. I haven't solved this one myself, I wonder if it's easier with RtpTransport or insertable streams?
Regarding SFUs - with something like HLS, I can really easily scale up using something like a caching CDN (not entirely sure if that's the right term). But the idea goes: I can distribute the HLS media playlist, and have my media segment entries prefixed with a caching/CDN service. The service will be configured with the actual origin server, and when a segment isn't in the CDN, the CDN fetches from the origin, on-demand. That was a nice option when I was doing owncast streaming since I really only paid based on viewership, and just had to make sure I had the correct cache-related headers on my media segments.
Or alternatively - I can push media segments up to a CDN and distribute that way, using an s3-compatible service, or just rsyncing to a server with better bandwidth, etc. One thing I didn't care for - again back when I was broadcasting with Owncast - was that I needed to make sure old media segments were expired, otherwise I would rack up an insane bill. I had a 24/7 owncast stream and if you're not on top of expiring media segments with your CDN, it gets expensive fast.
The overall idea is - serving HLS is ultimately serving files and there's a good amount of tooling for that, right.
Now that you mention it, I think WHIP/WHEP can solve some of that. I just don't know of any service where I can have that same cache/CDN-like experience, of either having the CDN connect to the origin as needed and fan-out, or where I can push up and let the service distribute. (though - now I'm googling for "webrtc sfu as a service" and see that is a thing!).
Didn't know about the playout delay extension.
Whether capturing individual audio is easier with RtpTransport or insertable streams - I'm unsure. Possibly? I just figure since MoQ is going to rely on things like WebCodec/WebAudio there's hopefully a bit more control over what happens with audio as it comes in.
I'll admit though - I've started noticing how often podcasts are clearly recorded using something that doesn't allow per-participant recordings and, I'm guessing as long as the quality is good enough most aren't worrying about it.
EDIT: feel like I should mention Pion rules, I used it a few years ago to put together an SRT-to-WebRTC thing and RTMP-to-WebRTC thing to use with Janus Gateway, it was so easy.