Comment by shiomiru
19 hours ago
If it's such a common issue, I would've thought Google already ignored searches from clients that do not enable JavaScript when computing results?
Besides, you already got auto-blocked when using it in a slightly unusual way. Google hasn't worked on Tor since forever, and recently I also got blocked a few times just for using it through my text browser that uses libcurl for its network stack. So I imagine a botnet using curl wouldn't last very long either.
My guess is it had more to do with squeezing out more profit from that supposed 0.1% of users.
Given that curl-impersonate[1] exists and that a major player in this space is also looking for experience with this library, I'm pretty sure forcing the execution of JS using DOM stuff would be a much more effective deterrent to prevent scraping.
[1] https://github.com/lwthiker/curl-impersonate
I have been web searching using Google from the command line, with no Javascript, for decades. Until last week I never sent a User Agent HTTP header either. After this change I'm still searching from the command line, no Javascript. Thus "requiring Javascript" is not a correct phrase to describe this change. The requirement is a User Agent HTTP header with an approved value. The only difference in searching for me as of the last few days is that I now send a User Agent HTTP header, with an "approved" string.
Javascript does not stop bots. At least it does not stop Googlebot.
IMO, the change is to funnel more people (cf. bots) into seeing AI-generated search results. The "AI" garbage requires Javascript. That is why the spokesperson suggests "degraded" search results for people who are not using Javascript. For me, the results are improved by avoiding the AI garbage, not degraded.
"Why didn't they do it earlier?" is a fallacious argument.
If we accepted it, there would basically only be a single point in time where a change like this could be legitimately made. If the change is made before there is a large enough problem, you'll argue the change was unnecessary. If it's made after, you'll argue the change should have been made sooner.
"They've already done something else" isn't quite as logically fallacious, but shows that you don't experience dealing with adversarial application domains.
Adversarial problems, which scraping is, are dynamic and iterative games. The attacker and defender are stuck in an endless loop of game and counterplay, unless one side gives up. There's no point in defending against attacks that aren't happening -- it's not just useless, but probably harmful, because every defense has some cost in friction to legitimate users.
> My guess is it had more to do with squeezing out more profit from that supposed 0.1% of users.
Yes, that kind of thing is very easy to just assert. But just think about it for like two seconds. How much more revenue are you going to make per user? None. Users without JS are still shown ads. JS is not necessary for ad targeting either.
It seems just as plausible that this is losing them some revenue, because some proprortion of the people using the site without JS will stop using it rather than enable JS.
It can't lose them revenue. Serving queries is expensive, getting rid of bots yields immediate and direct savings measured in $$$
I was using the word "revenue" very deliberately. Savings don't increase revenue.
The GP's argument was that this was about "squeezing out more profit from that supposed 0.1% of users". That can't be an argument about resource savings. The resource savings come from not serving bots, not from blocking legitimate users who happen to have disabled JS.
> "Why didn't they do it earlier?" is a fallacious argument.
I never said that, but admittedly I could have worded my argument better: "In my opinion, shadow banning non-JS clients from result computation would be similarly (if not more) effective at preventing SEO bots from poisoning results, and I would be surprised if they hadn't already done that."
Naturally, this doesn't fix the problem of having to spend resources on serving unsuccessful SEO bots that the existing blocking mechanisms (which I think are based on IP-address rate limiting and the UA's HTTPS fingerprint) failed to filter out.
> Yes, that kind of thing is very easy to just assert. But just think about it for like two seconds. How much more revenue are you going to make per user? None. Users without JS are still shown ads. JS is not necessary for ad targeting either.
Is JS necessary for ads? No. Does JS make it easier to control what the user is seeing? Sure it does.
If you've been following the developments on YouTube concerning ad-blockers, you should understand my suspicion that Search is going in a similar direction. Of course, it's all speculation; maybe they really just want to make sure we all get to experience the JS-based enhancements they have been working on :)
> Is JS necessary for ads? No.
JS is somewhat necessary for ads, they're not in anyway needed for displaying them, but instrumental in verifying that they are actually being displayed to human beings. Ad fraud is an enormous business.