Comment by pfraze

2 years ago

I'm on the team that implemented this, so I'm happy to answer questions. I'll give a brief technical overview.

It's broadly a system for publishing metadata on posts called "Labels". Application clients specify which labeling services they want to use in request headers. Those labels get attached to the responses, where they can then be interpreted by client.

This is an open system. Clients can choose which labelers they use, and while the Bluesky client hardcodes the Bluesky moderation another client can choose a different primary Labeler. Users can then add their community labelers, which I describe below. We aim to do the majority of our moderation at that layer. There are also "infrastructure takedowns" for illegal content and network abuse, which we execute at the services layer (ie the relay).

Within the app this looks like special accounts you can subscribe to in order to get additional filters. The labels can be neutral or negative, which means they can also essentially function as user badges. Over time we'll continue to extend the system to support richer metadata and more behaviors, which can be used to do things like community notes or label-driven reply gates.

This sounds like a good approach. Pretty much exactly this "opt-in/out, pluggable trust moderation" is something I'd thought about a number of times over the years, yet I'd never come across the relatively simple idea implemented in the real world until now

Do you/anyone reading know of any prior work? The closest I know of is this site, in fact, which is opt-out but not pluggable. Or maybe email spam filters, from the POV of the server admin at least

  • There aren’t a lot of exact matches that I’m aware of. Spam filters, ad blockers, Reddit, mastodon, and block party all came up during the discussions.

    • So, on Reddit, there's the problem that if you're interested in railroad trains, you have r/trains and r/trains2 and r/bettertrains and r/onlytrains due to differing moderation policies and clique drama, with plenty of duplication (and Reddit crossposts are always disconnected conversations).

      My understanding of Bluesky is that the equivalent would be a ‘feed’ tracking #trains filtered by some union or intersection of moderation teams. Is that correct?

      1 reply →

I have been loosely following Bluesky for awhile and read some blog posts now but haven't delved super deep. Can you expand on the "infrastructure takedowns"? Does this still effect third party clients? I am trying to understand to what degree this is a point of centralization and open to moderation abuse versus bluesky acting as a protocol and even if we really want to we can't take something down other than off our own client.

  • The network can be reduced to three primary roles: data servers, the aggregation infrastructure, and the application clients. Anybody can operate any of these, but generally the aggregation infra is high scale (and therefore expensive).

    So you can have anyone fulfilling these roles. At present there are somewhere around 60 data servers with one large one we run; one aggregator infra; and probably around 10 actively developed clients. We hope to see all of these roles expand over time, but a likely stable future will see about as many aggregator infrastructure as the Web has search engines.

    When we say an infrastructure takedown, we mean off the aggregator and the data server we run. This is high impact but not total. The user could migrate to another data server and then use another infra to persist. If we ever fail (on policy, as a business, etc) there is essentially a pathway for people to displace us.

    • Why would anyone run their own aggregator? (i.e. if you run a search engine, you can show contextual ads to recoup your investment and then some.)

      Sorry about going off-topic, I realise it's only tangentially about labelling.

      5 replies →

Obviously this is a highly moderation-averse crowd so I figured I’d add one small voice of support: I was very impressed by this post and your comment, and think this is a huge jump ahead of reddits mediocre system, or god forbid whatever’s going on at Twitter rn. This made me much more interested in Bluesky, and I might even click around your careers page.

In particular, applying Steam Curator functionality to content moderation is just perfect and I can’t believe I didn’t think of it before.

How will content that is illegal in some jurisdictions and legal in others be handled? Is there a presumed default jurisdiction, like California, or something?

  • Their stackable moderation system might actually allow one to implement this relatively easily.

    Add a moderation channel per country and let clients apply them depending on location/settings. It's naturally not perfect, but as one can just travel to other countries and get their (potentially less restricted) view or even simpler use a VPN, it's as good as basically any other such censorship measurement.

    • This wouldn’t still work though. If someone uploads CSAM and it’s distributed to multiple users in a jurisdiction where it’s banned (which is virtually all of them) but only hidden by the moderation filters, then Bluesky would still be in a lot of pain from distributing said material.

      Also, filters which are optional on the user’s part can’t really be counted as moderation.

      6 replies →

  • I’m unsure how it will play out in practice. I think it’s possible that different infra could wind up being deployed in jurisdictions that differ too significantly. Certainly that could happen outside of Bluesky.

    Bluesky itself is US-based.

    • Seems like following the EU's rules - and use the below to have tags that could be placed on a post as per the EU categories?

      --

      https://www.europarl.europa.eu/RegData/etudes/ATAG/2020/6581...

      As you can use their guidelines:

      European Union (EU), there are several types of social media content that are considered illegal.

      ---

      * Incitement to Terrorism: Any content that encourages or promotes terrorist acts, violence, or extremism is prohibited.

      * Illegal Hate Speech: Social media posts that spread hate based on race, ethnicity, religion, gender, sexual orientation, or other protected characteristics are not allowed.

      * Child Sexual Abuse Material: Sharing, distributing, or creating content related to child sexual abuse is strictly illegal.

      * Infringements of Intellectual Property Rights: Posting copyrighted material without proper authorization violates intellectual property rights.

      * Consumer Protection Violations: Misleading advertisements, scams, or fraudulent content that harms consumers are prohibited.

      --

      These rules are further strengthened by stricter regulations for four specific types of content, which have been harmonized at the EU level:

      * Counter-Terrorism Directive: Addresses terrorist content.

      * Child Sexual Abuse and Exploitation Directive: Focuses on combating child sexual abuse material.

      * Counter-Racism Framework: Aims to prevent and combat racism and xenophobia.

      * Copyright in Digital Single Market Directive: Deals with copyright infringement online1.

      ---

      You could ideally put the EU illegal categories in a drop down like old Slashdot Style - but each mod who selects the same selection from the drop down adds a point to the tag - the post is removed upon some thresholds of the points.

      This could be different for each reagion - and a post could be flagged with points for each region... so have a region selection, then the illegal tag list sets to that region. A post maybe could be tagged by multiple regions infractions based on where the mod sits?

      Also - you can keep metrics for all posts flagged for what infraction types for - plus you can move them to the "if law enforcement needs this post S3 bucket" -- based on whatever time period the laws require.

    • Putting aside the social issues for a moment, I can imagine a government deciding to run its own moderation server and mandating use of that server in the country in question. I'd prefer that Bluesy not enable that by geolocking users of the official client to individual moderation servers, though.

Seems like a really, really good way to create a really, really boring website.

ETA: Rereading this, that is probably not a very helpful HNy comment, so let me elaborate.

Maybe I am old-fashioned, but one of the things that the internet is most useful for is exploring places and ideas you would otherwise never encounter or consider. And just like taking a wooden ship to reach the North Pole, browsing around the internet comes with significant risk. But given the opportunity for personal growth and development, for change, and so on, those risks might well be worth it.

That model of the internet, as I said, is somewhat old-fashioned. Now, the internet is mostly about entertainment. Bluesky exists to keep eyeballs on phones, just like Tiktok or Instagram or whatever. Sure, Bluesky is slightly more cerebral -- but only slightly.

People are generally not entertained by things that frustrate them (generally -- notable exceptions exist), so I can understand an entertainment company like Bluesky focusing on eliminating frustrations via obsessive focus on content moderation to ensure only entertaining content reaches the user. In that sense, this labeling thing seems really useful, just like movie ratings give consumers a general idea of whether the movie is something appropriate for them.

So in that sense, wonderful for Bluesky! But I think I'll politely decline joining and stick with other platforms with different aims.

  • What I want is a filter for angry posts. Social media exposes me to a wider cross section than I get in person and there is really a limit to the amount of distress I can absorb.

    • Right, and I think you've zeroed in on what I feel is the most important point here. Somehow, for a lot of people, "diversity of opinions" and "angry posts subject to moderation" are more or less the same thing. For me, those are distinct things, and don't think diversity of opinions, at least not on things of interest to me (philosophy, astronomy etc) are under the crosshairs. Of course I feel that way because I feel like I'm right about something, and that something is the idea that diversity of opinion has a lot more to it than whether something is or isn't moderated.

      1 reply →

  • The internet isn't one size fits all, all the time. Most people don't want to be challenged all the time and everywhere. Sometimes you want to watch a challenging documentary about socioeconomics in 17th century Poland and other times you want to watch Friends. I see a good use case here for BlueSky allowing users to vary moderation & use curated lists to separate interests & moods.

  • I think I can have lively, intellectually stimulating exposure without say, someone advocating for the mass killing of gay people. Or engaging in an interesting political discussion without bad-faith conspiracy theorists shitting up the place. For example, the “chiller” which as far as I know is just designed to cool down a hot button discussion actually sounds super amazing for this purpose.

    One of the things that frustrated me about browsing twitter now is the constant bad faith discussions about everything, one-off potshots that waste pixels and lead nowhere. A moderation tool that sifts that and just gets me to the people that actually know wtf they’re talking about and are engaging honestly would benefit me greatly!

    • Definitely -- but the problem isn't really "content" moderation. What it seems like you actually want is personality / tone / user moderation -- which Bluesky isn't really doing.

      To analogize to real life, I have friends with whom I agree 100% on politics, but I never talk to them about it, because they're annoying when they do it. But I also have friends who disagree with me on political and other issues, but we have wonderful conversations because of the manner in which we disagree.

      I don't what Bluesky is doing will actually help with this problem. For one thing, I think it's design as a "feed" basically precludes any solid sort of discussion (compared to an Internet forum). The medium kind of encourages the "one-off potshots" you mentioned, and moderation won't do much to cure it.

      I could be wrong though!

      3 replies →

    • In modern US political discourse, there is no nuance in “us vs them”. Your moderators that are meant to just tag “advocating for the mass killing of gay people” will also put a “here’s why I think you should vote for Trump” post in the same category.

      3 replies →

  • BlueSky doesn’t care about eyeballs. It’s a non-profit enabling a common good.

    • Not to be nitpicky but it's not quite that simple. BlueSky is a Public Benefit LLCs which is explicitly for-profit but does have some other limits - so it does count for something. I can't find exactly what BlueSky's public benefit is claimed to be though.

      https://theintercept.com/2023/06/01/bluesky-owner-twitter-el...

      "Liu, who answered some of my questions, did not respond when I asked for the exact language the Bluesky PBLLC used to describe its public benefit mission when incorporating the company. She also didn’t say whether the company would publish its annual benefits reports — reports that PBLLCs are required to create each year, but PBLLCs incorporated in Delaware, where Bluesky was incorporated, are not required to make them public."

      1 reply →

I need to moderate the moderators.

Not in a 'I can ban these moderators from moderating my instance' way. I need a metamoderatorion mechanism. I need to see how good moderators are to establish trust and when a moderator is taken over by a hostile actor I need to see its score tank.

Do you have something like this on the roadmap?

  • It sounds like, perhaps, the moderators only label the content. Then it’s up to your own client (and how you configure it) to filter the content, based on those labels.

    If I’ve got that right, then a client could be created that, e.g., displays labels from different moderators rather than filter the content. In fact, I’d guess most clients will have that mode.

    • That's my understanding, too. And since it's underpinned by ATProto, rather than being coupled with Bluesky, "moderator score" apps could be built that independently track how 'useful' the labels are (and, by extension, the labeling services), subjective to each individual app's preferences. Then users could rely on moderation rankings from their favorite moderation ranking app to determine which moderators to use and when to switch if the quality tanks.

      1 reply →

    • I need labels on labels and labels on labellers. I also need labellers for labellers. With that, I can create a network of labellers which can keep each other honest with enough distribution; think DNS root servers but which constantly check if every other root server is still reasonably trustworthy to be authoritative.

      Then I need users who (hopefully) vote on/rate/report labels, which is its own problem.

      3 replies →

It seems to me that the relay is still a single point of failure here for moderation. What happens if my PDS gets blocked by the relay, for reasons that I disagree with? (Let's assume the content I post is legal within my jurisdiction). Are there any separate relays that I can use?

I think what might be needed here is that anyone with enough resources can run their own relay, and PDSes can subscribe to multiple relays and deduplicate certain things.

  • > I think what might be needed here is that anyone with enough resources can run their own relay, and PDSes can subscribe to multiple relays and deduplicate certain things.

    That is how it's designed, yes!

Just learned about Bluesky’s labelling approach. The first thing comes to mind is who is responsible for the content on the platform - Bluesky? Labellers?

For example, some rouge user starts posting offensive content about other users, on the brink of breaking the law. Let’s say these other users will mention it to labellers, who this time will refuse to take this content down.

Can you tell me what will happen in such scenario?

  • >For example, some rouge user starts posting offensive content about other users, on the brink of breaking the law. Let’s say these other users will mention it to labellers, who this time will refuse to take this content down.

    Under US law, the user posting the content is the only one legally responsible for it. Someone hosting the content could be required to take it down by court order or other legal process (like under the DMCA SH provisions) if subject to US jurisdiction. Bluesky is, so they'd have a process same as anyone else in the US, and of course could make their own moderation decisions regardless on top. But the protocol allows 3rd parties to take on any role in the system technically (though certain infra roles sound like they'd be quite expensive to run as a practical matter), so they could be subject to different law. Foreign judgements are not enforceable in the US if they don't meet the same bar of the 1st Amendment a domestic one would have to.

    Labellers from the description would never have any legal responsibility in the US, and they do not "take content down", they're only adding speech (meta information, their opinion on what applies to a given post) on top, best-effort. Clients and servers then can use the labels to decide what to show, or not.

    At any rate "on the brink of breaking the law" would mean nothing, legally. And "offensive" is not a legal category either. Bluesky or anyone else would be free to take it down anyway, there is zero restriction on them doing whatever they want and on the contrary that itself is protected speech. But they would be equally free to not do so, and if someone believed it actually broke one of the very limited categories of restrictions on free speech and was worth the trouble they'd have to go to court over it.

  • Can't you start your own labeler and just agree to take it down? Then others can subscribe to your labeled to avoid those posts?

Honestly, something here doesn't quite sit right with me.

From the article:

> No single company can get online safety right for every country, culture, and community in the world.

From this post:

> There are also "infrastructure takedowns" for illegal content and network abuse, which we execute at the services layer (ie the relay).

If there's really no point in running relays other than to keep the network online, and running relays is expensive and hard work that can't really be funded by individuals, then it seems like most likely there will be one relay forever. If that turns out to be true, then it seems like we really are stuck with one set of views on morality and legality. This is hardly a theoretical concern when it comes to the Japanese users flooding Bluesky largely out of dissatisfaction with Twitter's moderation of 'obscene' artworks.

  • Before the Elon event (and maybe again now), Pawoo was by far the most active Mastodon instance, and there's an almost complete partition between ‘Western’ and ‘Eastern’ Mastodon networks.

    • Yeah, this issue continues to cause a lot of strife across the Fediverse. Misskey.io and mstdn.jp are both extremely popular (presumably only second to Mastodon.social) and obviously these Japanese sites follow Japanese law and norms with regards to obscenity.

      I certainly am not saying that server operators should feel obliged to content they do not like, especially if they believe it is illegal or immoral. After all, a huge draw of the Fediverse is the fact that you get to choose, right? Sure, personally I think all obscenity law is weapons-grade bullshit regardless of how despicable the subject matter may be, but also, server operators shouldn't feel pressure to compromise their ideals, attract a crowd of people they simply don't like, or (of course) run the risk of breaking the law in their jurisdiction, so what happens on the Fediverse seems like it is the right way for things to go, even if it is harmful to the federation in the short term.

      But that's kind of the double-edged sword. You either have centralization where someone decrees "the ultimate line" or you don't. With Bluesky, there's a possibility that it will wind up being decentralized properly, but it could wind up being defacto centralized even if they uphold their promises and I think that strongly devalues the benefits of decentralization where they count most. Today, there is in fact one company that holds the line, and it's unclear if that's going to meaningfully change.

      There are some aspects of AT proto and Bluesky that I think are extremely cool: an example is identity, identity in AT proto is MUCH better than it is in ActivityPub right now. However I'm not surprised they are not going to acknowledge this problem. Just know that they are 100% aware of it, and my opinion is that they really do want to find an answer that won't piss everyone off, but they also probably want to avoid the perception that Bluesky is a haven for degenerates, especially early on when there are less network effects and they desperately need to appear "better" than what is already out there. Unfortunately, I can only conclude that their strategy is most likely the best one for the future of their network, but it still rubs me the wrong way.

      4 replies →

Reddit's subreddit structure and the underlying moderation system is quite scalable: site admins only deal with the things that subreddit moderators have failed to. And, in case they keep failing, admins can shut down the subreddit or demote moderators responsible for it. The work is clearly split between admins and mods, and mods only work on the content they're interested in.

Now, with this model, I don't see such a scalable structure. You're not really offloading any work to moderation, and also, all mods will be working on all of the content. No subreddit-like boundaries to reduce the overlaps. I know, mods can only work on certain feeds, but, feeds overlap too.

It's also impossible to scale up mod power with this model when it's needded: For example, Reddit mods can temporarily lock posts for comments, lock subreddit, quarantine subreddit to deal with varying degrees of moderation demand. It's impossible to have that here because there can't be a single authority to control the content flow.

How do you plan to address these scalability and efficiency issues?

  • >Reddit's subreddit structure and the underlying moderation system is quite scalable: site admins only deal with the things that subreddit moderators have failed to.

    All that happens is mods just lock any post with any hint of a problem. It's become or rather started out as being ridiculous. They just lock instead of moderate.

    • True. Mod power is abused a lot. But that’s a different problem, not necessarily mutually exclusive with scaling.

  • > all mods will be working on all of the content. No subreddit-like boundaries to reduce the overlaps

    Not necessarily, that's up to the moderator.

    Today, I subscribe to the #LawSky and AppellateSky feeds because I am interested in legal issues. Sometimes these feeds have irrelevant material: either posts who happened to use the "" emoji for some non-legal reason or just people chatting about their political opinions on some legal case.

    Someone could offer to label JUST the posts in these feeds with a "NotLegalTopic" tag and I would find that filter quite useful.

  • > You're not really offloading any work to moderation

    I think everyone at some stage has been burnt by top-down moderation (e.g., overzealous mods, brigading, account suspensions, subreddit shutdowns, etc.) and generally everyone finds it lacking because what's sensitive to one person, might be interesting to another. Community driven moderation liberalizes this model and allows people to live in whatever bubble they want to (or none at all). This kind of nit-picky moderation can be offloaded in this way, but it doesn't obviate top-down moderation completely (e.g., illegal content, incitement to violence, disinformation, etc.) Though a scoring system could be used for useful labellers, and global labels could be automated according to consensus (e.g., many high-rated labellers signalling disinformation on particular posts)

Moderation does not sound like an additive function, i.e multiple moderation *filters" that add up to the final experience. That seems an almost usenet like interaction, where each user has its own skizoid killfile and the default experience is bad.

Rather, moderation is a cohesive whole that defines the direction of the community, the same rules and actions apply to everybody.

  • This was a very active topic of debate within the team. We ended up establishing the idea of "jurisdictions" to talk about it. If a moderation decision is universal to all viewers, we'd say that it's under a specific jurisdiction of a moderator. This is how a subreddit functions, with Reddit being the toplevel jurisdiction and the subreddits acting as child jurisdictions.

    The model of labelers as we're releasing in this first iteration is, as you say, an additive filtration system. They are "jurisdictionless." We chose this model because Bluesky isn't (presently) segmented into communities like Reddit is, and so we felt this was the right way to introduce things.

    That said, along the way we settled on a notion of the "user's personal jurisdiction," meaning essentially that you have certain rights to universally control your own interactions. Blocking is essentially under this umbrella, as are thread gates (who can reply). What's then interesting is that you can enlist others to help run your personal jurisdiction. Blocklists are an example of that which we have now: you can subscribe to blocks created by other people.

    This is why I'm interested in integrating labels into threadgates, and also exploring account-wide gates that can get driven by labels. Because then it does enable these labelers to apply uniform rules and actions to those who request it. In a way, it's a kind of dynamic subreddit that fits the social model.

    • > That said, along the way we settled on a notion of the "user's personal jurisdiction," meaning essentially that you have certain rights to universally control your own interactions. Blocking is essentially under this umbrella, as are thread gates (who can reply).

      As as user, "personal jurisdiction" is a critical feature to me. If I start a thread, I want to maintain some minimal level of agreeable behavior in the responses associated with my original post.

      It's sort of like online newspaper comments sections. Many unmoderated comment sections were once full of 20 disagreeable trolls who drove everyone else away. The bad drives out the good, and trolls accumulate over time. This doesn't even need to be ideological—I knew a semi-famous tech personality that had a handful of personal stalkers who infested every open comment section. Many newspapers fixed this by disabling comments or actually hiring moderators.

      I won't post to a service if the average reader of my posts will see a pile of nasty, unpleasant comments immediately following each of my posts.

      This is why I mostly prefer blogs, and moderated community forums. Smaller, well-moderated subreddits are great, as are private Discords.

    • This sounds very much like a federal system of government like the US, with each level of jurisdiction applying their own rules.

      For Bluesky, by default does power lie with the highest authority by or the lowest authority (i.e. the user)?

      The US model was originally designed bottom up, with power having to be granted to the higher authority. Admittedly we've effectively abandoned this today.

      1 reply →

  • Once you support delegating your killfile to other people it no longer functions the same as each user having their own. And FWIW, as an example, here on Hacker News, many of us have showdead turned on all the time and so while I am aware of the moderation put in place by the site, I actually see everything.

    Also: frankly, if there were someone willing to put a lot of effort into moderating stuff Hacker News doesn't--stuff like people asking questions you can answer in the article or via Google--I would opt into that as I find it wastes my time to see that stuff.

    And with delegation of moderation, I think it will start to feel like people voting for rulesets more than a bunch of eclectic chaos; if a lot of people agree about some rule that should exist, you will have to decide when you make a post how many people you are willing to lose in your potential audience.

How are labels managed? Assume I'm a labeller labeling certain posts as "rude". Will my "rude" label be identified as the same label as other labellers who also label posts as "rude", or will they be tied to my labeller identity (actually being e.g. "xyz123-rude" under the hood)?

> and while the Bluesky client hardcodes the Bluesky moderation

And what if I don't like your moderation? Can it be overruled, or is this just a system for people who want stricter moderation, not lighter?

  • Sounds like the answer is in the next part of that sentence:

    > another client can choose a different primary Labeler

    So you can overrule the moderation, but not if you use the official client.

  • First, we've built our own moderation team dedicated to providing around-the-clock coverage to uphold our community guidelines.

    A partial answer* is that the Bluesky moderation enforces their community standards, so if you don't like that, then the platform may not be for you.

    * - because, yes, this does still have a single entity at fundamental control. But I presume their focus is on the basic threshold (ie legality under US law) of content.

    • > A partial answer* is that the Bluesky moderation enforces their community standards, so if you don't like that, then the platform may not be for you.

      This article is about how different hosting instances can customize the moderation which is offered, and how the user can choose among the offered moderation settings. The whole point is to allow different moderation implementations, because different communities may have different needs.

    • I don't presume that when I hear terms like "moderation" or "community standards." I think he would have said "Bluesky automatically removes illegal content..." if that were true.

      1 reply →

On a side note I really like the freedom to speak and freedom to ignore approach, it's the thing i cherish the most about internet communication- the ability to be the free and uninhibited self

Do you plan to add custom labels?

Let’s say we ask a question to a politician and they ignore it.

Can we label the question as unanswered, so clients will remind users the question is unanswered?

Maybe add some screens of actual moderation on the landing? It looks like it just shows profile settings?