Comment by theptip
3 days ago
> The entire distinction here is that as a website operator you wish to serve me ads
While this is sometimes the case, it’s not always so.
For example Fediverse nodes and self-hosted sites frequently block crawlers. This isn’t due to ads, rather because it costs real money to serve the site and crawlers are often considered parasitic.
Another example would be where a commerce site doesn’t want competitors bulk-scraping their catalog.
In all these cases you can for sure make reasonable “information wants to be free” arguments as to why these hopes can’t be realized, but do be clear that it’s a separate argument from ad revenue.
I think it’s interesting to split revenue into marginal distribution/serving costs, and up-front content creation costs. The former can easily be federated in an API-centric model, but figuring out how to compensate content creators is much harder; it’s an unsolved problem currently, and this will only get harder as training on content becomes more valuable (yet still fair use).
> it costs real money to serve the site and crawlers are often considered parasitic.
> Another example would be where a commerce site doesn’t want competitors bulk-scraping their catalog
I think of crawlers that bulk download/scrape (eg. for training) as distinct from an agent that interacts with a website on behalf of one user.
For example, if I ask an AI to book a hotel reservation, that's - in my mind - different from a bot that scrapes all available accommodation.
For the latter, ideally a common corpus would be created and maintained, AI providers (or upstart search engines) would pay to access this data, and the funds would be distributed to the sites crawled.
(never gonna happen but one can dream...)
But which hotel reservation? I want my agent to look at all available options and help me pick the best one - location vs price vs quality. How does it do that other than by scanning all available options? (Realistically Expedia has that market on lock, but the hypothetical still remains.)