Comment by anthonyhn
1 year ago
First off, I want to thank you and the other members of the CC Foundation, the CC data set is an incredible resource to everyone.
Much of the UA data, including CCBot, is from an upstream source[0]. I was torn on whether CCBot and other archival bots should be included in the configs, since these services are not AI bot scraping services. I've added an exclusion for CCBot[1] and the archival services from the recommended configs.
[0] https://darkvisitors.com/agents/ccbot
[1] https://github.com/anthmn/ai-bot-blocker/commit/ae0c2c40fd08...
Thank you!