Comment by coldpie
20 days ago
> Everything on this can be programmatically simulated by a bot with bad intentions. It will be a cat and mouse game of finding behaviors that differentiate between bot and not and patching them.
Forcing Facebook & Co to play the adversary role still seems like an improvement over the current situation. They're clearly operating illegitimately if they start spoofing real user agents to get around bot blocking capabilities.
I'm imagining a quixotic terms of service, where "by continuing" any bot access grants the site-owner a perpetual and irrevocable license to use and relicense all data, works, or other products resulting from any use of the crawled content, including but not limited to cases where that content was used in a statistical text generative model.