← Back to context

Comment by corndoge

14 hours ago

From hosting a peertube instance solely for my own stuff for several years, I've come to appreciate just how difficult self hosting a streaming video platform is. As you say, bandwidth and storage requirements are significant; another less obvious one is transcoding. When a user uploads an HD video file, it needs to be transcoded into lower resolutions if you want there to be a hope of people streaming it. While Peertube itself is perfectly happy running on 2-4 vcpu cores on a cheap cloud vm, if you use those cores to handle transcode jobs it can take huge amounts of time (like 20+ hours) to transcode even medium length 1080p videos. So you really need either a lot of CPU that sits mostly idle, or hardware acceleration, both of which are expensive when purchased from cloud providers. Or you can use remote transcoding to offload transcode jobs onto your home gaming pc or whatever, which works well, but can be complicated and a bit touchy to set up properly, and now you have a point of failure dependent on your home network...

And then, people watching videos are used to the YouTube experience with its world class CDN infra enabling subsecond first frame latencies even for 4k videos. They go on Peertube and first frame takes like 5 seconds for a 1080p video...realistically, with today's attention spans most of them are going to bounce before it ever plays.

Since you seem like you have practical knowledge here, I hope you don't mind me asking:

Would it change the equation, meaningfully, if you didn't offer any transcoding on the server and required users to run any transcoding they needed on their own hardware? I'm thinking of a wasm implementation of ffmpeg on the instance website, rather than requiring users to use a separate application, for instance.

Would you think a general user couldn't handle the workload (mobile processing, battery, etc), or would that be fairly reasonable for a modern device and only onerous in the high traffic server environment?

  • > Would it change the equation, meaningfully, if you didn't offer any transcoding on the server and required users to run any transcoding they needed on their own hardware?

    I think the user experience would be quite poor, enough that nobody would use the instance. As an example a 4k video will transcoded at least 2 times, to 1080p and 720p, and depending on server config often several more times. Each transcode job takes a long time, even with substantial hwaccel on a desktop.

    Very high bitrate video is quite common now since most phones, action cameras etc are capable of 4k30 and often 4k60.

    > Do you think a general user couldn't handle the workload (mobile processing, battery, etc), or would that be fairly reasonable for a modern device and only onerous.

    If I had to guess, I would expect it be a poor experience. Say I take a 5 minute video, that's probably around 3-5gb. I upload it, then need to wait - in the foreground - for this video to be transcoded and then uploaded to object storage 3 times on a phone chip. People won't do it.

    I do like the idea of offloading transcode to users. I wonder if it might be suited for something like https://rendernetwork.com/ where users exchange idle compute to a transcode pool for upload & storage rights, and still get to fire-and-forget uploads?

I shove 1080p mp4s onto a very cheap server and I get 2 seconds of load time there versus somewhere between 1 and 2 seconds on youtube. And looking at network requests, the first chunk of the file loads in well under a second so I'd expect something with the metadata preloaded could start playing at that point. So if peertube takes 5 seconds, I really wonder why.

Is it inconvenient to transcode before/during upload?

  • If you scale an instance you need to use object storage (s3/b2/etc). Fetch from object storage can occasionally have latency spikes.

    5 seconds is somewhat exaggerating, I clicked through 10 or so videos on my instance to check and it's 2-3 seconds most of the time.

    • We can exclude rare enough outliers.

      I've experienced B2 throwing a wrench into the dream of low latency, but some object stores are very fast. And more importantly you only need the first couple megabytes of each video to be on fast storage.

The funny thing is that YouTube has now enshittified to the point where people routinely DO wait well over 5 seconds to watch the video they actually wanted to watch while interstitials and other commercials are jammed in. Even with adblock enabled, the latest YouTube code won't unlock the first frame of the actual video till some period of ad time has passed so you just sit there looking at a black screen. This on its own definitely isn't enough to get people to leave the platform, but it's still notable how much worse the experience has gotten compared to a few years ago.

  • On what setup? All YouTube videos load and start playing instantly for me. Every time I've experienced otherwise in the last couple years, it's been my first indication that e.g. AWS is exploding that day

    • I wonder if it depends what country you are in. I only notice it occasionally when the video won't play in FreeTube or PipePipe (which always has the pause at the start since the last few months) and I'm forced to open an incognito browser tab to watch, and then I realize just how many ads other people are being subjected to before they can even watch the video.

What value do you get in transcoding your own stuff? I have plex transcoding disabled on all local network devices that stream it and run into minimal issues (codecs on TV devices, mostly).

  • By "my own stuff" I mean that I use my instance to upload videos I would otherwise upload to youtube - videos I made that I intend to share with people. The usual reasons for transcoding apply.