← Back to context

Comment by jdprgm

2 years ago

I have been loosely following Bluesky for awhile and read some blog posts now but haven't delved super deep. Can you expand on the "infrastructure takedowns"? Does this still effect third party clients? I am trying to understand to what degree this is a point of centralization and open to moderation abuse versus bluesky acting as a protocol and even if we really want to we can't take something down other than off our own client.

The network can be reduced to three primary roles: data servers, the aggregation infrastructure, and the application clients. Anybody can operate any of these, but generally the aggregation infra is high scale (and therefore expensive).

So you can have anyone fulfilling these roles. At present there are somewhere around 60 data servers with one large one we run; one aggregator infra; and probably around 10 actively developed clients. We hope to see all of these roles expand over time, but a likely stable future will see about as many aggregator infrastructure as the Web has search engines.

When we say an infrastructure takedown, we mean off the aggregator and the data server we run. This is high impact but not total. The user could migrate to another data server and then use another infra to persist. If we ever fail (on policy, as a business, etc) there is essentially a pathway for people to displace us.

  • Why would anyone run their own aggregator? (i.e. if you run a search engine, you can show contextual ads to recoup your investment and then some.)

    Sorry about going off-topic, I realise it's only tangentially about labelling.

  • Would it be possible to do a p2p aggregator (Like yacy but for atprotocol)?

    • It might be worth trying, but essentially what you're trying to do is cost/load sharing on the aggregation system. You could do that by computing indexes and sharing them around, to reduce some amount of required compute, and I suspect we'll be doing things like that. (For example, having the precomputed follow graph index as a separate dataset.) However if you're trying to replace the full operational system, I think the only kind of load sharing that could work would require federated queries, which I consider a pretty unproven concept.