Comment by Dylan16807

20 hours ago

> When the network is bad, you get... fewer JPEGs. That’s it. The ones that arrive are perfect.

You can have still have weird broken stallouts though.

I dunno, this article has some good problem solving but the biggest and mostly untouched issue is that they set the minimum h.264 bandwidth too high. H.264 can do a lot better than JPEG with a lot less bandwidth. But if you lock it at 40Mbps of course it's flaky. Try 1Mbps and iterate from there.

And going keyframe-only is the opposite of how you optimize video bandwidth.

> Try 1Mbps and iterate from there.

From the article:

“Just lower the bitrate,” you say. Great idea. Now it’s 10Mbps of blocky garbage that’s still 30 seconds behind.

  • The problem is I think that they are using moonlight which is "designed" to stream games at very low latency. I very much doubt that people need <30ms response times watching an agent terminal or whatever they are showing!

    When you try and use h264 et al at low latency you have to get rid of a lot of optimisations to encode it as quickly as possible. I also highly suspect the vaapi encoder is not very good esp at low bitrates.

    I _think_ moonlight also forces CBR instead of VBR, which is pretty awful for this use case - imagine you have 9 seconds of 'nothing changing' and then the window moves for 0.25 seconds. If you had VBR the encoder could basically send ~0kbit/sec apart from control metadata, and then spike the bitrate up when the window moved (for brevity I'm simplifying here, it's more complicated than this but hopefully you get the idea).

    Basically they've used the wrong software entirely. They should try and look at xrdp with x264 as a start.

    • Yeah, i think the author has been caught out by the fact that there simply isn’t a canonical way to encode h264.

      JPEG is nice and simple, most encoders will produce (more or less) the same result for any given quality settings. The standard tells you exactly how to compress the image. Some encoders (like mozjpeg) use a few non-standard tricks to produce 5-20% better compression, but it’s essentially just a clever lossy preprocessing pass.

      With h264, the standard essentially just says how decompressors should work, and it’s up to the individual encoders to work out to make best use of the available functionality for their intended use case. I’m not sure any encoder uses the full functionality (x264 refuses to use arbitrary frame order without b-frames, and I haven’t found an encoder that takes advantage of that). Which means the output of different encoders has wildly different results.

      I’m guessing moonlight makes the assumption that most of its compression will come from motion prediction, and then takes massive shortcuts when encoding iframes.

  • Rejecting it out of hand isn't actually trying it.

    10Mbps is still way too high of a minimum. It's more than YouTube uses for full motion 4k.

    And it would not be blocky garbage, it would still look a lot better than JPEG.

  • Proper rate control for such realtime streaming would also lower framerate and/or resolution to maintain the best quality and latency they can over dynamic network conditions and however little bandwidth they have. The fundamental issue is that they don't have this control loop at all, and are badly simulating it by polling JPEGs.

  • 10Mbits is more than the maximum ingest bitrate allowed on Twitch. Granted, anyone who watches a recent game or an IRL stream there might tell you that it should go up to 12 or 15, but I don't think an LLM interface should have trouble. This feels like someone on a 4K monitor defeating themselves through their hedonic treadmill.

It might be possible to buffer and queue jpegs for playback as well to help with weird broken stall outs.

Video players used to call it buffering, and resolving it was called buffering issues.

Players today can keep an eye on network quality while playing too, which is neat.