Comment by jmb99
7 hours ago
Scraping text across the entire internet is orders of magnitudes easier than scraping YouTube. Even ignoring the sheer volume of data (exabytes), you simply will get blocked at an IP and account level before you make a reasonable dent. Even if you controlled the entire IPv4 space I’m not sure you could scrape all of YouTube without getting every single address banned. IPv6 makes address bans harder, true, but then you’re still left with the problem of actually transferring and then storing that much data.
For now, you actually get pretty far with Tor. Just reset your connection when you hit an IP ban by sending SIGHUP to the Tor daemon.
I did that when I was retraining Stable Audio for fun and it really turned out to be trivial enough to pull of as a little evening side project.
IPv6 doesn't make it "harder," as they would typically ban whole /48 prefixes.