Comment by jakewil
2 days ago
perhaps Iocaine [1] is what you're looking for. See the demo page [2] for what it serves to AI crawlers.
2 days ago
perhaps Iocaine [1] is what you're looking for. See the demo page [2] for what it serves to AI crawlers.
For images you have stuff like https://nightshade.cs.uchicago.edu/whatis.html
This site blocked me right away, seems quite agressive
Seems like a good way to waste tons of your bandwidth. Almost every serious data pipeline has some quality filtering in there (even open-source ones like FineWeb and EduWeb). And the stuff Iocaine generates instantly gets filtered.
Feel free to test this with any classifier or cheapo LLM.