Comment by esprehn
3 years ago
This isn't really JS, it's a purpose built evaluator that's only for evaluating a particular script on YouTube, assuming a huge list of things are true about how YouTube JS is written.
Ex. Its got a hard coded list of methods for String, and it doesn't respect prototypes. It only supports creating Date instances, and won't work if you override the global Date. It parses with regexes and implements all operators with python's operator module (which is the wrong type semantics) etc. Nearly none of the semantics of JS are implemented.
It's sort of the sandwich categorization problem:
If I write a C# "interpreter" in perl thats only 200 lines and just handles string.Join, string.Concat and Console.WriteLine, and it doesn't actually try to implement C# syntax or semantics at all and just uses perl semantics for those operations is it actually C#? :P
I say "not a sandwich".
And as a user of youtube-dl, I'm quite happy about this. This probably allows a very safe, restricted "subset" of JS. Way better than using a full JS engine. 900 lines is still small and manageable.
yt-dlp sometimes doesn't know how to evaluate the javascript / emcascript and will call out to an optional dependency, a real javascript interpreter, if installed.
I'm trying to get the thread model here. Is the concern that Youtube will inject JS into the payload which tries to break out of the youtuble-dl js sandbox using some zero day in whatever js engine they would use instead?
One of the reasons people use yt-dlp/youtube-dl (and nitter.net/etc) is to transform the modern proprietary JavaScript web into something more suitable for enthusiasts of the old document web and of FOSS. If the web switched to plain <video> then yt-dlp/youtube-dl would become completely unnecessary. Your browser should not have to run JS to watch an embedded video.
1 reply →
Google attempting zero days on client computers would be something. It's not totally without precedent (Sony CD rootkits - https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootk...) but would still be major news.
3 replies →
Let's say they end up using Node. Node has a quite complete standard library that lets you access files and everything.
Now if they do it right and only embed some bare JS interpreter, it's still way harder to audit than these < 900 lines, for which it is quite easy to convince oneself that the interpreted script cannot do much.
1 reply →
Embedding a whole js engine and then interopping with it from python would be non trivial. Good luck fixing any bugs or corner cases you hit that way. The V8 and spidermonkey embedding apis are both c++ (iirc) and non trivial to use correctly.
Having full control like this +simple code is probably lower risk and more maintainable, even if there's the challenge of expanding feature set if scripts change.
The alternative would be a console js shell, but those are very different from browsers so that poses it's own challenges.
5 replies →
youtube-dl targets a lot of websites other than Google properties, many of which are a lot sketchier (think, uh, NSFW streaming sites).
That’s the exact same logic I hear from developers who say things like:
Why do I need a full XML parser when I can just extract what I need with regex?
And:
All that RPC IDL stuff is overcomplicated, REST is so much easier because I can just write the client by hand.
Ok, we've changed this title to shrink the scope of the interpreter.
Submitted title was "YouTube-dl has a JavaScript interpreter written in 870 lines of Python".
Hence why HN better than Twitter.
The amount of high engagement just plain wrong tweets there are is just sad.
It’s quacks like a duck at midnight, but it’s actually a frog?
I suppose this means it would be easy for YouTube to fuck with youtube-dl simply by throwing in more features of JS?
Cat, meet mouse.
It's unfortunate, https://github.com/mpv-player/mpv/issues/8655#issuecomment-1...:
> Youtube now throttles requests of more than 10MB at a time, yt-dlp works around it by making many requests of 10MB using Range HTTP headers (yt-dlp calls it the http-chunk-size), but ffmpeg which does the downloading for mpv doesn't support that yet.
I want to change mpv or yt-dlp to support range-based video URLs (eg. appending &range=333999644-335298975&rn=5&rbuf=0 to URLs) which speed up stream seeking and probably eliminate throttling altogether, but I haven't taken the time to look into how to achieve it. For anyone interested, I have an open bug report at https://github.com/mpv-player/mpv/issues/10601, and have found https://satadalsengupta.github.io/docs/papers/2017_nossdav_y... describing these parameters.
This really isn't fair. Just because it doesn't faithfully implement whatever standard Javascript is on doesn't mean it isn't an interpreter. All an interpreter is is something that executes a script directly rather than requiring compilation. It is a defacto interpreter for a subset of javascript. Nothing more, nothing less. The title could be more clear, however.
esprehn didn't say it isn't an interpreter. They're saying it is an interpreter and what it's interpreting isn't (all of) JS. That's also what you're saying, so you're agreeing with esprehn.
Edit: You misunderstood baobabKoodaa in the same way. Nobody is arguing about what constitutes an interpreter, except you. The question is what language is being interpreted.
Before accusing someone of pedantry, it would first be good not to completely misread them.
There's a huge difference between an interpreter for "JavaScript" and an interpreter for a "subset of JavaScript".
Making a pedantic argument on what constitutes an interpreter is silly. The title is bad. It is an interpreter. I'll continue to eat downvotes on this because of the pedantry of HN.
6 replies →