Comment by nialse
11 days ago
Although Cloudflare CEO Matthew Prince pre-launched their new offering with a compelling speech and numbers to boot, the mechanics does not add up. There is an assumption that AI companies need to scrape the web for content. This is certainly true for new AI companies and new content, but the vast majority of scraping useful content has already been completed. In addition new content will tend to be AI generated itself, which might not help training, and in the US training on purchased content has been deemed fair use recently.
What problem is being solved? The perceived issues are twofold, increasing crawling by AI scraping bots is causing traffic and thus an additional cost, and content creators lack compensation for their work in terms of money or notoriety (according to Matt). Cloudflare obviously have traditionally focused on the first, and needing to grow they see the potential in being a middle man in the second.
Where does this get us? Will Cloudflares service lower traffic volumes not generating revenue? Absolutely. Use of the service will be perceived as a success based on this metric and the revenue generating traffic will stay on similar and higher levels initially. Then, if the content indexed becomes more and more stale, as AI companies may or may not be willing to pay the associated costs, revenues will slide long term. Content creators seeking fame or fortune may then seek other avenues to promote and distribute their content as they perceive the alternatives as better.
The sole hope for Cloudflare is that a couple of the large AI outlets "play ball", and make the payed for indexed content available based on subscription fees or, god forbid, ads. However, then they might would want their users to be able to access the full contents guarded by other paywalls, and not only previews offered.
One would hope that this would lead to a future where creative humans are compensated more for their cognitive work. Unfortunately, with the trajectory we're on, that is a select few as the marginal cost of content is quickly approaching zero.
> This is certainly true for new AI companies and new content, but the vast majority of scraping useful content has already been completed.
For training a base model, yes, but there's a big category of AI use case: search engine. Those invocations of the model involve web searches, often during reasoning steps, and they will absolutely scrape for content.
Agreed. The question is if new content is valuable enough? Or, will we see other sources rise to the occasion? Meta, Google, X and ByteDance at least have other sources of current content which they may start to promote "for visibility". If these sources will be sufficient for the reasoning steps is uncertain though.