Overview of the control flow (already known):
The Youtube API provides you with n - your video access token
If their new changes apply to your client (they do for "web") then it is expected your client will modify n based on internal logic. This logic is inside player...base.js
n is modified by a cryptic function
Modified n is sent back to server as proof that we're an official client. If you send n unmodified, the server will eventually throttle you.
So they can always change the function to keep you on your toes, hence you need to be able to run semi-arbitrary JS in order to keep using the API.
Waste of human brainpower but I guess that energy is better spent imagining a world where Google isn't in charge instead of kvetching about what they're doing with their influence.
There is a reason Google is able to serve the amount of video bandwidth, and also a reason why there are no worthwhile youtube clones. Some amount of scrape protection is absolutely essential.
I'd have to read up on the specifics as well, but I think basically Youtube uses a lot of obfuscated, rapidly and automatically changing Javascript code to fetch the video data. A project like youtube-dl has to run this code to be able to download videos, because that's what's happening in the browser as well.
For those interested further, in some of the past few weeks youtube-dl had stopped working intermittently for multiple hours at a time, and it was precisely related to this code.
We have a custom-made Discord music bot on our server which uses ytdl to stream songs so we can listen together, and at one point we were listening and suddenly got some obscure JavaScript error.
We began joking that there's some bug in the code which breaks it after 6PM, but later found out that Google had changed some of the obfuscated JS and this basically broke this part of code, which prevented us from fetching the song information.
If you start a youtube video and then pause it and resume a few days later, you'll notice that the youtube page plays for ~30 seconds (ie. whats buffered) and then the page refreshes. I'd guess this refresh is to pick up the new javascript and any updates to the HTML code.
It's kinda annoying if you have a lot of youtube tabs open for a long time and come back to them.
What is interesting is it seems to be constant cat and mouse. I download a YT vid. It crawls. Update yt-dlp, it flies again. I love yt-dlp and use it a lot.
It’s not like proof of work at all. It’s just a challenge and response; youtube includes a random number in the webpage for each video, and expects to see a request parameter with a particular value calculated from that random number when you request the video. If you don’t do the arithmetic it throttles you to 50kb/s.
Since the calculation of the response is done in JS, and they occasionally change the formula, some download programs are moving towards running the JS rather than trying to keep up with the changes.
It’s really just bullshit to make people’s lives harder.
.. pirate determination is a thing to behold, as is crazed-repetitive digital grabs.. Its not a fair or accurate characterization to dismiss it as "making people's lives harder" .. it is remarkable that the Debian distros now include ytdl; lets do what is reasonable to make it continue
They need to run a JavaScript function to download YouTube videos at normal speeds.
Edit: it's also required to download music, otherwise it will just fail
Source:
- https://github.com/ytdl-org/youtube-dl/issues/29326#issuecom...
- https://github.com/ytdl-org/youtube-dl/blob/d619dd712f63aab1...
- https://github.com/ytdl-org/youtube-dl/commit/cf001636600430...
Wow:
So they can always change the function to keep you on your toes, hence you need to be able to run semi-arbitrary JS in order to keep using the API.
Waste of human brainpower but I guess that energy is better spent imagining a world where Google isn't in charge instead of kvetching about what they're doing with their influence.
There is a reason Google is able to serve the amount of video bandwidth, and also a reason why there are no worthwhile youtube clones. Some amount of scrape protection is absolutely essential.
1 reply →
I'd have to read up on the specifics as well, but I think basically Youtube uses a lot of obfuscated, rapidly and automatically changing Javascript code to fetch the video data. A project like youtube-dl has to run this code to be able to download videos, because that's what's happening in the browser as well.
For those interested further, in some of the past few weeks youtube-dl had stopped working intermittently for multiple hours at a time, and it was precisely related to this code.
We have a custom-made Discord music bot on our server which uses ytdl to stream songs so we can listen together, and at one point we were listening and suddenly got some obscure JavaScript error.
We began joking that there's some bug in the code which breaks it after 6PM, but later found out that Google had changed some of the obfuscated JS and this basically broke this part of code, which prevented us from fetching the song information.
If you start a youtube video and then pause it and resume a few days later, you'll notice that the youtube page plays for ~30 seconds (ie. whats buffered) and then the page refreshes. I'd guess this refresh is to pick up the new javascript and any updates to the HTML code.
It's kinda annoying if you have a lot of youtube tabs open for a long time and come back to them.
What is interesting is it seems to be constant cat and mouse. I download a YT vid. It crawls. Update yt-dlp, it flies again. I love yt-dlp and use it a lot.
But why not just use a normal JS engine called from Python?
It's used in the YouTube extractor: https://github.com/ytdl-org/youtube-dl/blob/d619dd712f63aab1...
I believe YouTube limits your bitrate if you don't pass a specific calculated value; it's possible youtube-dl has to parse and eval JS to get it.
> I believe YouTube limits your bitrate if you don't pass a specific calculated value
It's starting to become Widevine bullshit all over again.
It's their platform. They can do with it what they want.
14 replies →
You need to run some obscured JS to get decent download speeds from Youtube. Something along the lines of PoW.
It’s not like proof of work at all. It’s just a challenge and response; youtube includes a random number in the webpage for each video, and expects to see a request parameter with a particular value calculated from that random number when you request the video. If you don’t do the arithmetic it throttles you to 50kb/s.
Since the calculation of the response is done in JS, and they occasionally change the formula, some download programs are moving towards running the JS rather than trying to keep up with the changes.
It’s really just bullshit to make people’s lives harder.
Next step will probably be moving the calculation to webassembly or requiring the script to fetch the result via websocket or webrtc...
.. pirate determination is a thing to behold, as is crazed-repetitive digital grabs.. Its not a fair or accurate characterization to dismiss it as "making people's lives harder" .. it is remarkable that the Debian distros now include ytdl; lets do what is reasonable to make it continue
2 replies →
YouTube PM: We need to stop youtube-dl.
Engineers: make half arsed attempt.
IIRC it's used to extract/generate the signatures needed for YouTube media URLs