Comment by Andrews54757

10 months ago

Nsig/sig - Special tokens which must be passed to API calls, generated by code in base.js (player code). This is what has broken for yt-dlp and other third party clients. Instead of extracting the code that generates those tokens (eg using regular expressions) like we used to, we now need to run the whole base.js player code to get these tokens because the code is spread out all over the player code.

PoToken - Proof of origin token which Google has lately been enforcing for all clients, or video requests will fail with a 403. On android it uses DroidGuard, for IOS, it uses built in app integrity apis. For the web it requires that you run a snippet of javascript code (the challenge) in the browser to prove that you are not a bot. Previously, you needed an external tool to generate these PoTokens but with the Deno change yt-dlp should be capable of producing these tokens by itself in the near future.

SABR - Server side adaptive bitrate streaming, used alongside Google's UMP protocol to allow the server to have more control over buffering, given data from the client about the current playback position, buffered ranges, and more. This technology is also used to do server-side ad injection. Work is still being done to make 3rd party clients work with this technology (sometimes works, sometimes doesn't).

Nsig/sig extraction example:

- https://github.com/yt-dlp/yt-dlp/blob/4429fd0450a3fbd5e89573...

PoToken generation:

- https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide

- https://github.com/LuanRT/BgUtils

SABR:

- https://github.com/LuanRT/googlevideo

EDIT2: Addeded more links to specific code examples/guides

99 comments

Andrews54757

ACCount37 10 months ago

If you ever wondered why the likes of Google and Cloudflare want to restrict the web to a few signed, integrity-checked browser implementations?

Now you know.

jasode 10 months ago
>If you ever wondered why the likes of Google and Cloudflare want to restrict the web
I disagree with the framing of "us vs them".
It's actually "us vs us". It's not just us plebians vs FAANG giants. The small-time independent publishers and creators also want to restrict the web because they don't want their content "stolen". They want to interact with real humans instead of bots. The following are manifestations of the same fear:
- small-time websites adding Anubis proof-of-work
- owners of popular Discord channels turning on the setting for phone # verification as a requirement for joining
- web blogs wanting to put a "toll gate" (maybe utilize Cloudflare or other service) to somehow make OpenAI and others pay for the content
We're long past the days of colleagues and peers of ARPANET and NFSNET sharing info for free on university computers. Now everybody on the globe wants to try to make a dollar, and likewise, they feel dollars are being stolen from them.
- btown 10 months ago
  
  But this, too, skips over some nuance. There are a few types of actors here:
  - small content creators who want to make their content accessible to individuals
  - companies that want to gobble up public data and resell it in a way that destroys revenue streams for content creators
  - gatekeepers like Cloudflare who want to ostensibly stop this but will also become rent-extractors in the process
  - users who should have the right to use personal tools like yt-dlp to customize their viewing experience, and do not wish to profit at the expense of the creators
  We should be cautious both that the gatekeepers stand to profit from their gatekeeping, and that their work inhibits users as well.
  If creators feel this type of user (often a dedicated fan and would-be promoter) is a necessary sacrifice to defend against predatory data extractors… then that’s absolutely the creator’s choice, but you can’t say there’s a unified “us” here.
  
  1 reply →
- skydhash 10 months ago
  
  > small-time websites adding Anubis proof-of-work
  Those were already public. The issue is AI bot ddos-ing the server. Not everyone has infinite bandwith.
  > owners of popular Discord channels turning on the setting for phone # verification as a requirement for joining
  I still think that Discord is a weird channel for community stuff. There's a lot of different format for communication, but people are defaulting to chat.
  > web blogs wanting to put a "toll gate" (maybe utilize Cloudflare or other service) to somehow make OpenAI and others pay for the content
  Paid contents are good (Coursera, O'Reilly, Udemy,...). But a lot of these services wants to have free powered by ads (for audience?).
  ---
  The fact is, we have two main bad actors: AI companies hammering servers and companies that want to centralize content (that they do not create) by adding gatekeeping extension to standard protocols.
- bayindirh 10 months ago
  
  > Now everybody on the globe wants to try to make a dollar, and likewise, they feel dollars are being stolen from them.
  I'm not in it for the dollar. I just want the licenses I put on my content/code to be respected, that's all. IOW, I don't what I put out there to be free forever (as in speech and beer) to be twisted and monetized by the people who re in this for the dollar.
- pryelluw 10 months ago
  
  I don’t feel like dollars are stolen from me. It’s more of companies abusing my goodwill to publish information online. From higher bills as a result of aggressive crawling, to copying my work and removing all copyright/licensing from the code. Sure, fair use and all, but when they return the same exact code it just makes me wonder.
  Nowadays, producing anything feels like being the cows udder.
- jrochkind1 10 months ago
  
  i want my content borrowed/shared, and I still need to be engaged in this stuff because the poorly behaved distributed bots that have arisen in the past year are trying to take boundless resources from my site(s), that I cannot afford.
- jrm4 10 months ago
  
  Then some of those small people are wrong too.
  I wish we could all just stop fighting the truth of the tech -- it costs ZERO to make copies of things, and adjust accordingly.
  Patreon (and keep it real, OnlyFans) are roughly the only viable long term models.
- einpoklum 10 months ago
  
  > The small-time independent publishers and creators also want to restrict the web because they don't want their content "stolen".
  I'm sure some music creators may have, years ago, been against CD recorders, or platforms like Napster or even IRC-based file transfer for sharing music. Hell, maybe they were even against VCRs back in the day. But they were misguided at best.
  People who want to prevent computer users from freely copying data are, in this context at least, part of "them" rather than "us".
- bitwize 10 months ago
  
  Duh. I've known this for decades. The biggest advocates for DRM I've known are small-time content creators: authors, video producers, musicians. They've been saying the same thing since the 90s: without things like DRM, their stuff would be pirated, and they'd like to earn a living doing what they love instead of grinding at a day job to support themselves while everybody benefits from their creative output. In addition, major publishers and record labels won't touch stuff that's been online because of the piracy risk. They don't want to make an investment in smaller creators without a return in the form of sales of copies. That last bit is less true of music now than it used to be because of streaming and stuff, but the principle still applies.
  This is why the DMCA will never be repealed, DRM will never go away, and there is no future for general purpose computing. People want access digital content, but the creators of that content wouldn't release it at all if they knew that it could be copied endlessly by whomever receives it.
  
  2 replies →
- krageon 10 months ago
  
  It's us vs them. What big corps want is fundamentally adversarial due to it's motivation. I like to think that humans can conceptually not be your enemy.
- mschuster91 10 months ago
  
  > The small-time independent publishers and creators also want to restrict the web because they don't want their content "stolen"
  ... or just keep their site on the Internet. There hasn't been any major progress on sanctioning bad actors - be it people running vulnerable IoT crap that ends up being taken over by a botnet, cybercriminals and bulletproof hosters, or nation state actors. As long as you don't attack targets from your own geopolitical class (i.e. Russians don't attack Russians, a lot of malware will just quit if it spots Russian locale), you can do whatever the fuck you want.
  And that is how we end up with darknet services where you can trivially order a DDoS taking down a website you don't like or, if you manage to get your opponent's IP leaked during an online game, their residential IP address. Pay with whatever shitcoin you have, and no one is any wiser who the perpetrator is.
- mrguyorama 10 months ago
  
  >The small-time independent publishers and creators also want to restrict the web
  Oh really? Does Linus's Floatplane go to this extent to prevent users from downloading stuff? Does Nebula? Does whatever that gun youtuber's version of video site do this?
  Does Patreon?
- johnebgd 10 months ago
  
  It’s like we are living in an affordability crisis and people are tired of 400 wealthy billionaires profiting from peoples largess in the form of free data/tooling.
- greenavocado 10 months ago
  
  When Nixon slammed the gold window shut so Congress could keep writing blank checks for Vietnam and the Great Society, it wasn't just some monetary technicality. It was the moment America broke its word to the world and broke something fundamental in us too. Suddenly money wasn't something you earned through sweat or innovation anymore. It became something politicians and bankers could conjure from thin air whenever they wanted another war, another corporate bailout, another vote-buying scheme.
  Fast forward fifty years and smell the rot. That same fiscal recklessness Congress spending like drunken sailors while pretending deficits don't matter has bled into every pore of society. Why wouldn't it? When BlackRock scoops up entire neighborhoods with Fed-printed cash while your kid can't afford a studio apartment, people notice. When Tyson jacks up chicken prices to record profits while diners can't afford bacon, people feel it. And when some indie blogger slaps a paywall on their life's work because OpenAI vacuumed their words to train ChatGPT? That's the same disease wearing digital clothes.
  We're all living in Nixon's hangover. The "us vs us" chaos you see Discord servers demanding your phone number, small sites gatekeeping against bots, everyone scrambling to monetize scraps that's what happens when trust evaporates. Just like the dollar became Monopoly money after '71, everything feels devalued now. Your labor? Worth less each year. Your creativity? Someone's AI training fuel. Your neighborhood? A BlackRock asset on a spreadsheet.
  And Washington's still at it! Printing trillions to "save the economy" while inflation eats your paycheck alive. Passing trillion-dollar "infrastructure bills" that somehow leave bridges crumbling but defense contractors swimming in cash. It's the same old shell game: socialize the losses, privatize the gains. The factory worker paying $8 for eggs understands this. The nurse getting lectured about "wage spirals" while hospital CEOs pocket millions understands this. The teenager locking down their Discord because bots keep spamming scams? They understand this.
  Weimar happened when money became meaningless. 1971 happened when promises became meaningless. What you're seeing now the suspicion, the barriers, the every-man-for-himself hustle is what bubbles up when people realize the whole system's running on fumes. The diner owner charging $18 for a burger isn't greedy. The blogger blocking AI scrapers isn't a Luddite. They're just building levees against a flood Washington started with a printing press half a century ago.
  The tragedy is that we're all knee-deep in the same muddy water, throwing sandbags at each other while the real architects of this mess the political grifters, the Fed bankers, the extraction-engine capitalists watch dry-eyed from their high ground. Until we stop accepting their counterfeit money and their counterfeit promises, we'll keep drowning in this rigged game. The gold window didn't just close in '71. The whole damn social contract rusted shut.
  
  20 replies →
mtrovo 10 months ago
I don't know, it's really hard to blame them. In a way, the next couple of years are going to be a battle to balance easy access to info with compensation for content creators.
The web as we knew it before ChatGPT was built around the idea that humans have to scavenge for information, and while they're doing that, you can show them ads. In that world, content didn't need to be too protected because you were making up for it in eyeballs anyway.
With AI, that model is breaking down. We're seeing a shift towards bot traffic rather than human traffic, and information can be accessed far more effectively and, most importantly, without ad impressions. So, it makes total sense for them to be more protective about who has access to their content and to make sure people are actually paying for it, be it with ad views or some other form of agreement.
- SV_BubbleTime 10 months ago
  
  Don’t worry!
  Ads are coming to AI. The big AI push next will be context, your context all the time. Your phone will “help” and get all your data to OpenAI…
  “It looks like you went for a run today? Good job, you deserve a treat! Studies show a little ice cream after a long run is effectively free calories! It just so happens the nearest Dairy Queen is running a promotion just for the next 30 minutes. I’m getting you directions now.”
  
  7 replies →
- chrisweekly 10 months ago
  
  I think your point is valid, but FTR the "shift" happened long before ChatGPT; bot traffic has exceeded that of humans for over a decade.
th0ma5 10 months ago

Weird people talking about small time creators wanting DRM I've never seen that... Usually they'd be hounding for any attention? I don't know why multiple accounts are seemingly independently bringing this up, but maybe it is trying to muddy the waters? This concept?
supriyo-biswas 10 months ago
At least for YouTube, viewbotting is very much a thing, which undermines trust in the platform. Even if we were to remove Google ads from the equation, there’s nothing preventing someone from crafting a channel with millions of bot-generated views and comments, in order to paid sponsor placements, etc.
The reasons are similar for Cloudflare, but their stances are a bit too DRMish for my tastes. I guess someone could draw the lines differently.
- ACCount37 10 months ago
  
  If any of this was done to combat viewbotting, then any disruption to token calculation would prevent views from being registered - not videos from being downloaded.
  
  3 replies →
- wzdd 10 months ago
  
  Youtube has already accounted for this by using a separate endpoint to count watch stats. See the recent articles about view counts being down attributed to people using adblockers.
  Even if they hadn't done that, you can craft millions of bot-sponsored views using a legitimate browser and some automation and the current update doesn't change that.
  So I'd say Occam's razor applies and Youtube simply wants to be in control of how people view their videos so they can serve ads, show additional content nearby to keep them on the platform longer, track what parts of the video are most watched, and so on.
- rwmj 10 months ago
  
  I'm sure that's a problem for Youtube. What does it have to do with me rendering Youtube videos on my own computer in the way I want?
  
  5 replies →
- imiric 10 months ago
  
  Like another comment mentioned: that's a problem for YouTube to solve.
  They pay a lot of money to many smart people who can implement sophisticated bot detection systems, without impacting most legitimate human users. But when their business model depends on extracting value from their users' data, tracking their behavior and profiling them across their services so that they can better serve them ads, it goes against their bottom line for anyone to access their service via any other interface than their official ones.
  This is what these changes are primarily about. Preventing abuse is just a side benefit they can use as an excuse.
- sporkxrocket 10 months ago
  
  As a viewer, this is not even remotely my problem.
- ForHackernews 10 months ago
  
  > which undermines trust in the platform
  What? What does this even mean? Who "trusts" youtube? It's filled with disinformation, AI slop and nonsense.
  
  2 replies →
eek2121 10 months ago

The fact you shoved Cloudflare in there shows your ignorance of the actual problems and solutions offered.
codedokode 10 months ago
There could be valid reasons for fighting downloaders, for example:
- AI companies scraping YT without paying YT let alone creators for training data. Imagine how many data YT has.
- YT competitors in other countries scraping YT to copy videos, especially in countries where YT is blocked. Some such companies have a function "move all my videos from YT" to promote bloggers migration.
- transcriptase 10 months ago
  
  >AI companies
  Like Google?
  >scraping YT without paying YT let alone creators for training data
  Like Google has been doing to the entire internet, including people’s movement, conversations, and habits… for decades?
  
  1 reply →
- toomuchtodo 10 months ago
  
  - Enforce views of ads
  (not debating the validity of this reason, but this is the entire reason Youtube exists, to sell and push ads)
- baxuz 10 months ago
  
  Then they should allow a download API for paying customers.
  
  7 replies →
- Chris2048 10 months ago
  
  Who says these are valid?
- supriyo-biswas 10 months ago
  
  Why is this being downvoted? Are people really gonna shoot the messenger and fail to why a company may be willing to protect their competitive position?
gjsman-1000 10 months ago
Everything trends towards centralization on a long enough period.
I laugh at people who think ActivityPub or Mastodon or BlueSky will save us. We already had that, it was called e-mail, look what happened once everyone started using it.
If we couldn't stop the centralization effects that occurred on e-mail, any attempt to stop centralization in general is honestly a utopian fool's errand. Regulation is easier.
- toomuchtodo 10 months ago
  
  I am a big supporter of AT Protocol, and I contribute some money to a fund to build on it. Why laugh at running experiments? Nothing will "save us," it is a constant effort as long as humans desire to use these systems to connect. Email exists today, and is very usable still as a platform that cannot be captured. The consolidation occurred because people do not want to run their own servers, so we should build for that! Bluesky and AT Protocol are experiments in building something different, with different use cases and capabilities, that also cannot be captured. Just like email. You can run your own PDS. You can run your own stack from PDS to users "end to end" if you so choose. You can pay to do both of these tasks. No one can buy this or take it away from you, if it is built on protocols instead of a platform someone can own and control.
  Regulation would be great. The EU does it well. It is lacking in the US, and will be for some time. And so we have to downgrade to technical mitigations against centralization until regulation can meet the burden.
  
  1 reply →
- numpad0 10 months ago
  
  e-mail can't handle 24/7 1k posts/sec traffic which Twitter was about. A more appropriate analogue is IRC.

Aperocky 10 months ago

And barely a few days after google did it the fix is in.

Amazing how they simply couldn't win - you deliver content to client, the content goes to the client. Could be the largest corporation of the world and we still have yt-dlp.

That's why all of them wanted proprietary walled gardens where they would be able to control the client too - so you get to watch the ads or pay up.

dylan604 10 months ago

> For the web it requires that you run a snippet of javascript code (the challenge) in the browser to prove that you are not a bot.

How does this prove you are not a bot. How does this code not work in a headless Chromimum if it's just client side JS?

Andrews54757 10 months ago
Good question! Indeed you can run the challenge code using headless Chromium and it will function [1]. They are constantly updating the challenge however, and may add additional checks in the future. I suppose Google wants to make it more expensive overall to scrape Youtube to deter the most egregious bots.
[1] https://github.com/LuanRT/BgUtils
- toomuchtodo 10 months ago
  
  LLMs solve challenges. Can we not solve these challenges with sufficiently advanced LLMs? Gemini even, if you're feeling lulz-y.
  
  3 replies →
Beretta_Vexee 10 months ago
Once JavaScript is running, it can perform complex fingerprinting operations that are difficult to circumvent effectively.
I have a little experience with Selenium headless on Facebook. Facebook tests fonts, SVG rendering, CSS support, screen resolution, clock and geographical settings, and hundreds of other things that give it a very good idea of whether it's a normal client or Selenium headless. Since it picks a certain number of checks more or less at random and they can modify the JS each time it loads, it is very, very complicated to simulate.
Facebook and Instagram know this and allow it below a certain limit because it is more about bot protection than content protection.
This is the case when you have a real web browser running in the background. Here we are talking about standalone software written in Python.
- cylemons 10 months ago
  
  How does testing rendering work? Can javascript get pixel data from the DOM
  
  3 replies →
- dylan604 10 months ago
  
  why can a bot dev not just get all of these values from the laptop's settings and hardwire the headless version to have the same values?
  
  1 reply →