← Back to context

Comment by jraph

3 years ago

And as a user of youtube-dl, I'm quite happy about this. This probably allows a very safe, restricted "subset" of JS. Way better than using a full JS engine. 900 lines is still small and manageable.

yt-dlp sometimes doesn't know how to evaluate the javascript / emcascript and will call out to an optional dependency, a real javascript interpreter, if installed.

I'm trying to get the thread model here. Is the concern that Youtube will inject JS into the payload which tries to break out of the youtuble-dl js sandbox using some zero day in whatever js engine they would use instead?

  • One of the reasons people use yt-dlp/youtube-dl (and nitter.net/etc) is to transform the modern proprietary JavaScript web into something more suitable for enthusiasts of the old document web and of FOSS. If the web switched to plain <video> then yt-dlp/youtube-dl would become completely unnecessary. Your browser should not have to run JS to watch an embedded video.

    • On my Ivy Bridge laptop running Linux, enabling hardware video decode in mpv took installing one package and adding one line to mpv.conf. Enabling hardware decoding in Firefox took multiple attempts of Googling frantically, toggling flags in about:config, passing logging environment variables to Firefox, recording a Pernosco trace of multi-process communication, and even asking for help in the gfx-firefox Matrix chat where they pointed out I had disabled media.rdd-process.enabled causing Firefox to print a misleading error message in about:support saying HARDWARE_VIDEO_DECODING was available, but failing at runtime saying WebRender was disabled even though it was enabled. And to my knowledge, hardware decoding in Chromium is simply not possible on Linux right now (maybe possible on Chromebooks, I haven't checked).

      Even after I fixed hardware acceleration, playing a 1080p YouTube video in Firefox using hardware H.264 decoding took more CPU energy (40% of a core) than playing the same video in mpv using software H.264 decoding (20% of a core). Web browsers are just horrifically complex, intractable to understand, and inefficient.

  • Google attempting zero days on client computers would be something. It's not totally without precedent (Sony CD rootkits - https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootk...) but would still be major news.

    • While they likely wouldn't do a zero-day, their JS files, particularly for automated captchas, do push the boundaries of whatever JS engine they're executed inside. See https://github.com/neuroradiology/InsideReCaptcha#the-analys... and note that this analysis is 8 years old. While there's minimal risk if you're either using a full-fledged modern JS engine or a limited-subset interpreter like the OP, an older or non-optimized spec-compliant JS engine might hit pathological performance cases and result in you DOSing yourself.

      2 replies →

  • Let's say they end up using Node. Node has a quite complete standard library that lets you access files and everything.

    Now if they do it right and only embed some bare JS interpreter, it's still way harder to audit than these < 900 lines, for which it is quite easy to convince oneself that the interpreted script cannot do much.

    • Nowadays they could probably use Deno. Without permissions it doesn't allow network or file access etc.

  • Embedding a whole js engine and then interopping with it from python would be non trivial. Good luck fixing any bugs or corner cases you hit that way. The V8 and spidermonkey embedding apis are both c++ (iirc) and non trivial to use correctly.

    Having full control like this +simple code is probably lower risk and more maintainable, even if there's the challenge of expanding feature set if scripts change.

    The alternative would be a console js shell, but those are very different from browsers so that poses it's own challenges.

  • youtube-dl targets a lot of websites other than Google properties, many of which are a lot sketchier (think, uh, NSFW streaming sites).

That’s the exact same logic I hear from developers who say things like:

Why do I need a full XML parser when I can just extract what I need with regex?

And:

All that RPC IDL stuff is overcomplicated, REST is so much easier because I can just write the client by hand.