Comment by ahussain

20 days ago

It seems like the biggest downside of this world is iteration speed.

If the AT instagram wants to add a new feature (i.e posts now support video!) then can they easily update their "file format"? How do they update it in a way that is compatible with every other company who depends on the same format, without the underlying record becoming a mess?

That's a great question!

Adding new features is usually not a problem because you can always add optional fields and extend open unions. So, you just change `media: Link | Picture | unknown` to `media: Link | Picture | Video | unknown`.

You can't remove things true, so records do get some deprecated fields.

Re: updating safely, the rule is that you can't change which records it would consider valid after it gets used in the wild. So you can't change whether some field is optional or required, you can only add new optional fields. The https://github.com/bluesky-social/goat tool has a linting command that instantly checks whether your changes pass the rules. In general it would be nice if lexicon tooling matures a bit, but I think with time it should get really good because there's explicit information the tooling can use.

If you have to make a breaking change, you can make a new Lexicon. It doesn't have to cause tech debt because you can make all your code deal with a new version, and convert it during ingestion.

  • That's true if you define the problem as "does my parser crash" and not whether the app is perceived as working correctly. If some platform adds support for video posts, then the next thing that happens is people start making posts that are only video. Meaning that in every other client, users see what appears to be an entirely empty post. Which will be considered a bug.

    This is the core argument of Moxie's seminal essay, The Ecosystem Is Moving:

    https://signal.org/blog/the-ecosystem-is-moving/

    One of the controversial things we did with Signal early on was to build it as an unfederated service. Nothing about any of the protocols we’ve developed requires centralization; it’s entirely possible to build a federated Signal Protocol-based messenger, but I no longer believe that it is possible to build a competitive federated messenger at all.

    That was written in 2016 but it was true then and continues to be true today. Users reject federated open platforms because the coordination costs mean they don't move as fast as proprietary centralized platforms, and they often appear broken even if technically working as designed.

    Nothing about that analysis is unique to social media. It is also true of file formats. OpenOffice never took off because new features got added to Office first, so files that used those features would open in semi-corrupted ways in OpenOffice. The fact that OO represented things internally using open unions didn't matter at all.

    • I disagree that Bluesky is in conflict with The Ecosystem Is Moving. In contrast to most decentralized/distributed protocol projects they've managed to maintain control of almost all of their infrastructure with the exception of the personal data servers (pdses) of which they control 99.01%[1]

      Almost all ATProto apps just fetch posts by handle => did:plc => post-type aka "lexicon", so they depend on what Bluesky decides to give them. If someone were to introduce unknowns into the flagship product's "lexicon" they could fix that at the API or Indexing level before shipping this data to the apps that depend on their API.

      An actually decentralized network would have to overcome Moxie's criticism of the ecosystem. Can it be done? We'll keep trying.

      [1] https://arewedecentralizedyet.online/

    • Well, this doesn't prevent the "flagship" app from shipping things and doesn't slow it down. So it's at least not slowing down development which is the argument the parent post was making.

      I've actually observed the exact opposite thing. Since Bluesky is open source, it's often visible when developers start working on a feature. And they often check in lexicon changes early on. As a result, there's been a few cases where third party client actually added support for these features earlier than the official one since they already knew the shape of the data.

      This wouldn't always work, of course. Yes, if you're developing an app or a client, you better keep up with the ecosystem. But the landscape is competitive and there is no cost to switching. So if something falls behind, you can use something else.

      5 replies →

  • Are these just guidelines or is this enforced in some way? I guess readers could validate and skip anything that doesn't match their schema.

    • Most apps reading records will validate a record against the schema for that type. e.g. there's nothing stopping you from making a app.bsky.feed.post record with more than 300 graphemes in the "text" field, but that post won't appear in the "official" app/website because it fails schema validation.

      Similarly, there's nothing stopping you from adding another field in your post. It'll just get ignored because the app you're using doesn't know about it. e.g. posts bridged from mastodon by bridgy have an extra field containing the full original post's text, which you can display in your app if desired. reddwarf.app does this with these posts.

      Here's the schema for bluesky posts, if you're interested: https://pdsls.dev/at://did:plc:4v4y5r3lwsbtmsxhile2ljac/com....

      1 reply →

    • Yeah, look for this part in the the article:

      Lexicon validation works the same way. The com.tumblr in com.tumblr.post signals who designed the lexicon, but the records themselves could have been created by any app at all. This is why apps always treat records as untrusted input, similar to POST request bodies. When you generate type definitions from a lexicon, you also get a function that will do the validation for you. If some record passes the check, great—you get a typed object. If not, fine, ignore that record.

      So, validate on read, just like files.