Comment by Toritori12

11 days ago

Overall I agree with the idea, but prob will be cheaper to bypass CF considering the amount of data that big techs are consuming (also Google with get it for free because Google Search?). If successful, I wonder how agents will transfer this cost to the user.

>Google with get it for free because Google Search

What if the second step is that Google pays the page it visits? By enabling a crawler fee per page, news websites could make some articles uncrawlable unless a huge fee is paid. Just thinking aloud, but I could easily see a protocol stating pricing by different kinds of "licensing" e.g. "internal usage", "redistribution" (what google news did/does?), "LLM training", etc. Cloudflare, acting as a central point for millions of websites, makes this possible.

  • The question is: who has the leverage?

    If some small news website denies Google Bot crawling, it'll disappear from Google and essentially it'll disappear from the Internet. People do a great lengths to appease the Google Crawler.

    If some huge news website demands fees from Google, it might work, I guess. But I'm not sure that it would work even for BBC or CNN.

    • I agree about the leverage and small website reasoning, definitely some game-theory related thinking is needed to get something like this right. But it does feel like this enables the "unionization" of websites against scraping giants, google is in an especially interesting position because, as you mentioned, could blackmail you into scraping in exchange for indexing.

    • If its a smaller news site they have already de-ranked them, and used their content for AI answers

  • It'd be a fitting solution if news closed the loop, crawled Google et al. to see if any of their content showed up there, then repriced future cotent higher for any search engines that reproduced content via genai.

More publishers will start blocking google bots as well, bc google is already killing their revenue with AI results.