← Back to context

Comment by M30

3 years ago

How should a programming noob interpret this? Be impressed at what was achieved here? Be concerned about security implications using the tool? Something else entirely?

This is the compiler writer equivalent of parsing HTML with regex:

It is technically wrong - it isn't a sufficiently rich and powerful approach to handle all JS (HTML) that you might throw at it. It'll work for a while until it eventually barfs when you least expect it.

EXCEPT that if the inputs you are giving it come from some understood source(s) that aren't likely to change, then a simpler approach to the "all singing all dancing" correct may be appropriate and justified. E.g. because it might be easier to write, easier to maintain and/or less attack surface etc.

  • > some understood source(s) that aren't likely to change

    Does that apply to YouTube? Or any of the other hundreds of supported sites?

    • Presumably because it gets tested with those sites and the JS doesn't change that much it can be fixed or adjusted as required.

It's an extremely tiny subset of JS—as an example, the only object that can be instantiated is Date. Anything other than "Date" after "new" throws an exception.

It's definitely neat, but not especially useful outside of the confines of its current application, and the security concerns of such a tiny subset will be minimal.

  • > Anything other than "Date" after "new" throws an exception

    It's even very sensitive to white space.

The "interpreter" in the youtube-dl source is probably safe from a security standpoint.

yt-dlp seems to support running javascript in a full javascript interpreter/headless browser called phantomjs though. Running javascript in a full interpreter like this is a lot more scary from a security standpoint. I am not sure whether phantomjs sandboxes the javascript evaluation from the rest of the system, and if it does, whether the sandbox actually works properly at all. It looks like the project is not being maintained which is another bad sign.

Big projects with lots of manpower behind them such as chromium have trouble keeping javascript evaluation safe, so I would really suggest not trusting phantomjs on untrusted input.

The goal of youtube-dl is to download a video off of YouTube for offline storage.

This isn't something YouTube particularly enjoys. They would rather you keep coming back -- every visit is more ad revenue for them. If you have an offline copy, you don't need to visit YouTube anymore.

YouTube has an incentive, therefore, to make it more difficult to download (or "scrape") their content.

I'm not particularly sure of the specific details, but apparently YouTube has added JavaScript (a programming language that executes in the browser) as a hurdle to jump over. A simple python script doesn't have enough brains to execute JavaScript, only enough to realize that it exists. (Clearly, youtube-dl is sophistication enough to have jumped over it.)

These are the conclusions I come to, having written software for about a decade.

1) Once you give information to someone, be it text, pictures, sound, or video -- they will do whatever they want with it, and you have no control. Oh, yes -- it may be illegal. Maybe unethical. But the fact of the matter is you do not have control over information once it leaves your hands.

2) Adding hurdles to make it harder to access the information does little to stop someone who is dedicated to accessing it.

3) Implementing a subset of JavaScript in such an elegant and tiny manner is quite impressive.

How you interpret these facts depends on your worldviews. If you are a media and content creator, you will view these facts differently than a politician, and a teenager.

As an engineer and amateur philosopher, I certainly support the rights of content creators to be paid for their work. And yet, I fear that more and more, content creators want to lease me a right to listen their music, instead of own a copy of it.

I used to own CDs, DVDs, movies, and books. What happens if Amazon or YouTube decides to not serve me anymore? Anything I've "purchased" from them, I lose access to.

Further more, if I create a song, I used to be able to burn copies of CDs and distribute it on the street corners. Now, you have to sign up to stream on Spotify. This is a double edged sword -- I get a wide audience, but Spotify will do whatever they want with me.

This troubles me.

> How should a programming noob interpret this?

The browser is client-facing and everything there is possible to reverse engineer and figure out. So if you design a web-based application, and are depending on client-side Javascript for any security or distribution enforcement, it can be helpful, but can ultimately be unwound and cracked even if obfuscated, etc.

> Be impressed at what was achieved here?

Yes. Try to download a YouTube video with out it or an online service which is probably using it internally.

  • Youtube-dl is impressive. This particular hack is not.

    • youtube-dl as a whole is not particularly impressive either. It’s a big pile of unresolved technical debt, of hacks-upon-hacks and quick-and-dirty temporary solutions just like this one staying there for years.

In the face of weird shit like this, I give you the permission to go with your gut.