Comment by jgalt212
3 years ago
Unfortunately, the Open Web just makes it easier for Google to smart snippet all the things. And it's not even about monetization (via ads) for the source content provider, but how about giving some credit to the source.
Having Google crawl everything on an Open Web is immensely preferable to the alternative of a closed web, or no web at all. Part of uploading things to the internet is reconciling that everyone can see, copy and distribute the content you provide. It's part of authoring anything digitally, and a poor boogeyman in a world where the Open Web has few demonstrable harms.
What should really scare people is the prospect of a common interface like the internet disappearing and being monetized by private interests. We take the Open Web today for granted, and while I partially feel like Doctorow is too fatalist, I also agree that interoperability is a core part of what makes the web function.
Even if that is true, that is not what users want. They do not want everything they have ever posted to be out on the internet with no way to delete it.
Scrapers are hostile actors against users which is one reason social media like sites invest resources to defend their users against scrapers.
> Scrapers are hostile actors against users
They are also user agents: the idea that the only viable web browser is one that meets a certain preconception (like Chrome) is a violation of the fundamental design principles of the web. If I want to scrape some pages and render them in a terminal (archive them to track silent edits to important news stories), via text to speech or whatever, websites should be produced in a format that’s amenable to it: the web will be richer and more vibrant as a result.
Search engine scrapers are friendly to me, in that they make it possible to find content that I want. A cool blog post is no good for me if I can't find it.
>> Part of uploading things to the internet is reconciling that everyone can see, copy and distribute the content you provide
In some ways this is a narrow definition of the Web. There is a lot of activity placed behind a login to expressly prevent the information from being public access.
If I upload a private repo to github I expect it to be private. If I interact with a Dr or lawyer on a site, I expect that to be private.
Of course inter-operability controlled by the -user- is different to the idea of interoperability controlled by the host, or by some external entity (scraper). The former is good, the latter less desired.
What you're saying is privacy. Lack of interoperability is like you uploaded code to github from one computer, and then fail to download it to another because the encryption algorithms or something are not compatible.
Interoperability isn't the problem. Leverage to enforce your own IP, or lack thereof as an individual, is.
Just because you publish content on the Web doesn't mean you give license to anyone to use it however they want. IP is rooted on a foundational principle of giving explicit consent. Copyleft is using that principle to explicitly state "anyone is free to use this however they want". Without that consent, it's assumed that the author can ask you to cease and desist. (Hence why e.g. wikipedia is plastered with creative Commons license mentions)
Sure, there are fair use exceptions. But if you take a close look at the conditions that need to be met before a published copy can be considered fair use, it's not as clear cut as it seems.
Thing is, only big media outlets with capital, like the NYTimes, are able to litigate against big actors who wholesale misuse interoperability after a tragedy of the commons kind of fashion.
This imbalance in resources and capital to enforce rights between a handful of big actors and everyone else is exactly what Doctorow draws attention to in the interview.
> the Open Web just makes it easier for Google to smart snippet all the things
How is this a problem? A simple `Disallow: Google` would solve your concern if you want to de-list from Google.
But then you have the tree falling in the woods problem.
Does Google have a track record of not respecting robots.txt? Otherwise why is it a problem?