Comment by oh_fiddlesticks
17 hours ago
> 1. One is building the index, which is a lot harder without a google offering its own API to boot. If other tech companies really wanted to break this monopoly, why can't they just do it?
FTA:
> Context matters: Google built its index by crawling the open web before robots.txt was a widespread norm, often over publishers’ objections. Today, publishers “consent” to Google’s crawling because the alternative - being invisible on a platform with 90% market share - is economically unacceptable. Google now enforces ToS and robots.txt against others from a position of monopoly power it accumulated without those constraints. The rules Google enforces today are not the rules it played by when building its dominance.
robots.txt was being enforced in court before google even existed, let alone before google got so huge:
> The robots.txt played a role in the 1999 legal case of eBay v. Bidder's Edge,[12] where eBay attempted to block a bot that did not comply with robots.txt, and in May 2000 a court ordered the company operating the bot to stop crawling eBay's servers using any automatic means, by legal injunction on the basis of trespassing.[13][14][12] Bidder's Edge appealed the ruling, but agreed in March 2001 to drop the appeal, pay an undisclosed amount to eBay, and stop accessing eBay's auction information.[15][16]
https://en.wikipedia.org/wiki/Robots.txt
Not only was eBay v. Bidder's Edge technically after Google existed, not before, more critically the slippery-slope interpretation of California trespass to chattels law the District Court relied on in it was considered and rejected by the California Supreme Court in Intel v. Hamidi (2003), and similar logic applied to other states trespass to chattels laws have been rejected by other courts since; eBay v. Bidder's Edge was an early aberration in the application of the law, not something that established or reflected a lasting norm.
The point is, robots.txt was definitely a thing that people expected to be respected before and during google's early existence. This Kagi claim seems to be at least partially false:
> Google built its index by crawling the open web before robots.txt was a widespread norm, often over publishers’ objections.
1 reply →
Nitpick: Google incorporated in 1998, so, before the Bidder's Edge case.
[flagged]
A classic case of climbing the wall, and pulling the ladder up afterward. Others try to build their own ladder, and Google uses their deep pockets and political influence to knock the ladder over before it reaches the top.
Why does Google even need to know about your ladder? Build the bot, scale it up, save all the data, then release. You can now remove the ladder and obey robots.txt just like G. Just like G, once you have the data, you have the data.
Why would you tell G that you are doing something? Why tell a competitor your plans at all? Just launch your product when the product is ready. I know that's anathema to SV startup logic, but in this case it's good business
Running the bot nowadays is hard, because a lot of sites will now block you - not just by asking nicely via robots.txt, but by checking your actual source IP. Once they see it's not Google, they send you a 403.
1 reply →
Cost, presumably. From the article:
> Microsoft spent roughly $100 billion over 20 years on Bing and still holds single-digit share. If Microsoft cannot close the gap, no startup can do it alone.
3 replies →
True. But the thing is if one says "We will make sure your site is in a world wide freely availabled index" which is kept fresh, google's monopoly ship already begins to take on water. Here is a appropriate line from a completely different domain of rare earth metals from The Economist on the chinese govt's weaponization of rare earths[1]:
> Reducing its share from 90% to 80% may not sound like much, but it would imply a doubling in size of alternative sources of supply, giving China’s customers far more room for manoeuvre.
[1] https://archive.ph/POkHZ#selection-1233.117-1233.302