Comment by SunlitCat

4 hours ago

Hi lewq, commentator of your post here.

Yeah, I used ChatGPT to help me write this answer ;) (Unlike JPEGs, it works at the right abstraction level for text.)

I think the core issue isn’t push vs pull or frame scheduling, but why you’re sending frames at all. Your use case reads much more like replicating textual/stateful UI than streaming video.

The fact that JPEG “works” because the client pulls frames on demand is kind of the tell — you’ve built a demand-driven protocol, then used it to fetch pixels. That avoids queuing, sure, but it’s also sidestepping video semantics you don’t actually need.

Most of what users care about here is text, cursor position, scroll state, and low interaction latency. JPEG succeeds not because it’s old and robust, but because it accidentally approximates an event-driven model.

Totally fair points about UDP + Kubernetes + enterprise ingress. But those same constraints apply just as well to structured state updates or terminal-style protocols over HTTPS — without dragging a framebuffer along.

Pragmatic solution, real struggle — but it feels like a text/state problem being forced through a video abstraction, and JPEG is just the least bad escape hatch.

— a human (mostly)