It's a non-default setting. So no. I am not sure what you disagree with exactly? We can call out BlueSky when they over-reach, but this is simply not it.
The setting is mostly cosmetic and only affects the Bluesky official app and web interface. People do find this setting helpful for curbing external waves of harassment (less motivated people just won't bother making an account), but the data is public and is available on the AT protocol: https://pdsls.dev/at://robpike.io/app.bsky.feed.post/3matwg6...
So nothing is stopping LLMs from training on that data per se.
That's assuming that AI companies are gathering data in a smart way. The entire MusicBrainz database can be downloaded for free but AI scrapers are still attempting to scrape it one HTML page at a time, which often leads into the service having errors and/or slowdowns.
It's a non-default setting. So no. I am not sure what you disagree with exactly? We can call out BlueSky when they over-reach, but this is simply not it.
It's also a way to prevent LLMs to get trained on their data without their consent.
That's not correct.
The setting is mostly cosmetic and only affects the Bluesky official app and web interface. People do find this setting helpful for curbing external waves of harassment (less motivated people just won't bother making an account), but the data is public and is available on the AT protocol: https://pdsls.dev/at://robpike.io/app.bsky.feed.post/3matwg6...
So nothing is stopping LLMs from training on that data per se.
That's assuming that AI companies are gathering data in a smart way. The entire MusicBrainz database can be downloaded for free but AI scrapers are still attempting to scrape it one HTML page at a time, which often leads into the service having errors and/or slowdowns.
1 reply →