Comment by andyferris
2 months ago
I have to ask - what OS do AI-training web scrapers tend to report? (A mixture? One with > 5% linux market share? Sorry, being a sceptic, otherwise I think this is fantastic news if accurately measured).
2 months ago
I have to ask - what OS do AI-training web scrapers tend to report? (A mixture? One with > 5% linux market share? Sorry, being a sceptic, otherwise I think this is fantastic news if accurately measured).
Most of these types of surveys do their best to filter out robots.
With over 50% of Internet traffic being robots, the results really don't make any sense at all if you don't.
Good question. Most of these headlines about Linux market share ("mind share"?) are completely uninformative about how widespread the use of Linux is in reality.
12 years ago or so, a similar headline appeared, then someone explained that the Chinese government had recently cracked down on Windows pirating (to appease the Americans) with the result that some PC vendors had stopped including (pirated copies of) Windows with the computers they sell (shipping some Linux distro instead of course) but since pirated Windows install media was still widely available, there quickly grew a cultural practice in which the consumer installs Windows (or gets his more technically-inclined cousin to do it for him) as soon as he gets his new PC home. But the headline reported on a statistic that did not catch this cultural practice because it counted only the OSes on computers when they were sold (i.e., "OS shipments").
What's "windows pirating" when Microsoft offers public ISO downloads and you can activate them with MAS?
The details of how the Chinese PC buyer gets Windows on his new PC is irrelevant to my point (as is whether it deserves the name "pirating").
I wonder if they used firefox to download internet explorer?
I tend to think that they mostly should be using their own user agent, and if not be desguised as the most common ones to avoid being detected too easily. Web scaping probably has been mostly running under Linux before the age of AI anyway. I'm not in the field, so if anyone more trustworthy info on that...
Yes they run Linux, but they either have their own user agent (not included in the stats) or are spoofing a real world web browser... In which case they might be spoofing Chrome on Windows even if they run on Linux.
Either way I don't think the 5% are impacted by scraping bots.
None https://platform.openai.com/docs/bots There's no reason for those bots to report any specific OS
Anything that's automated today is linux. So, I'll assume almost 99.99%, or may be BSD in some cases.
Any scraper out there that doesn't want to identify itself as such is very likely to spoof the most commonly used OS + browser combo (Chrome + Windows), regardless of what it's actually running on.
So basically the 5% number is pulled out of thin air.