Comment by nightpool

3 days ago

Note that all of this reflects design decisions on Bluesky's closed-source "AppView" server—any federated servers interacting with Bluesky would need to construct their own timelines, and do not get the benefit of the work described here.

As others have noted, the appview is open source. The dataplane has two implementations, one in postgres and another in scylla. The scylla dataplane is closed, the postgres one is open.

The interesting next stage for the postgres implementation is to create a sync engine for partial syncs of the network, so that an appview can run affordably. We ran some benches on the current state of the postgres implementation and found we could index 300k users on a $100/mo vps. I think with a couple of weeks of optimization that could reach 1mm users.

  • This is great to hear—my current understanding of the most recent state of the art on the topic is https://alice.bsky.sh/post/3laega7icmi2q which mentions that the self-hosted appview is not yet open source. So I'm glad to hear the situation has changed in the past 3 months.

    • It was open source (except the Scylla database layer) from the beginning, AFAIK - that blog post just says that they haven't set it up yet, because that's the hardest part to run

This is not true. Third party PDSes are fully supported by our app view, and our app view generates timelines for all the users on those PDSes.

  • What does this have to do with third party app views?

    • The statement "any federated servers interacting with Bluesky" is ambiguous, because Bluesky's federated model means there's many different types of servers, and one user's view of what a "federated server" could be vastly different from another.

      Federated PDS-s (which is probably the closest to what people mean when they say they want to federate on bluesky) would not need to reconstruct timelines if their users use the bsky.app appview.

      1 reply →

What reason does Bluesky give for not opening up their AppView code?

Another notable component that is closed source is the discovery feed generator, where at least there is some reason.

My thinking has evolved on this topic significantly as of late. My current thinking is we should create a secure gossip network on top of the Bluesky API, and forgot about all the DAG-CBOR stuff that gets stripped from the Jetstream. Hash the posts on the gossip layer and if posts change then diff them. This is all prep for when X billionaire buys out Bluesky then we just pop some signing key crypto on top of this gossip layer and wow! It's distributed!