← Back to context

Comment by pornel

2 days ago

I wonder why timelines aren't implemented as a hybrid gather-scatter choosing strategy depending on account popularity (a combination of fan-out to followers and a lazy fetch of popular followed accounts when follower's timeline is served).

When you have a celebrity account, instead of fanning out every message to millions of followers' timelines, it would be cheaper to do nothing when the celebrity posts, and later when serving each follower's timeline, fetch the celebrity's posts and merge them into the timeline. When millions of followers do that, it will be cheap read-only fetch from a hot cache.

This is probably what we'll end up with in the long-run. Things have been fast enough without it (aside from this issue) but there's a lot of low-hanging fruit for Timelines architecture updates. We're spread pretty thin from a engineering-hours standpoint atm so there's a lot of intense prioritization going on.

At some point they'll end up just doing the Bieber rack [1]. It's when a shard becomes so hot that it just has to be its own thing entirely.

[1] - https://www.themarysue.com/twitter-justin-bieber-servers/

@bluesky devs, don't feel ashamed for doing this. It's exactly how to scale these kinds of extreme cases.

  • I've stood up machines for this before I did not know they had a name, and I worked at the mouse company and my parking spot was two over from a J. Beibe'rs spot.

    So now we have Slashdot effect, HN hug, and its not Clarkson its... Stephen Fry effect? Maybe can be Cross-Discipline - there's a term for when lots of UK turns their kettles on at the same time.

    I should make a blog post to record all the ones I can remember.

  • Given that BlueSky is funded by Twitter, I'm assuming they know a lot more than us on how Twitter architects systems.

  • We never actually had a literal “Bieber Box”, but the joke took off.

    Hot shards were definitely an issue, though.

> and later when serving each follower's timeline, fetch the celebrity's posts and merge them into the timeline

I think then you still have the 'weird user who follows hundreds of thousands of people' problem, just at read time instead of write time. It's unclear that this is _better_, though, yeah, caching might help. But if you follow every celeb on Bluesky (and I guarantee you this user exists) you'd be looking at fetching and merging _thousands_ of timelines (again, I suppose you could just throw up your hands and say "not doing that", and just skip most or all of the celebs for problem users).

Given the nature of the service, making read predictably cheap and writes potentially expensive (which seems to be the way they've gone) seems like a defensible practice.

  • > I suppose you could just throw up your hands and say "not doing that", and just skip most or all of the celebs for problem users

    Random sampling? It's not as though the user needs thousands of posts returned for a single fetch. Scrolling down and seeing some stuff that's not in chronological order seems like an acceptable tradeoff.

This problem is discussed in the beginning of the Designing Data-Intensive Applications book. It's worth a read!

  • Do you know the name of the problem or strategy used for solving the problem? I'd be interested in looking it up!

    I own DDIA but after a few chapters of how database work behind the scenes, I begin to fall asleep. I have trouble understanding how to apply the knowledge to my work but this seems like a useful thing with a more clear application.

    • Yes, we used the Yahoo! “Feeding Frenzy” paper as the basis for the design of Haplocheirus (the timeline service).

Why do they "insert" even non-celebrity posts into each follower's timeline? That is not intuitive to me.

  • To serve a user timeline in single-digit milliseconds, it is not practical for a data store to load each item in a different place. Even with an index, the index itself can be contiguous in disk, but the payload is scattered all over the place if you keep it in a single large table.

    Instead, you can drastically speed up performance if you are able to store data for each timeline somewhat contiguously on disk.

  • Think of it as pre-rendering. Of pre-rendering and JIT collecting, pre-rendering means more work but it's async, and it means the timeline is ready whenever a user requests it, to give a fast user experience.

    (Although I don't understand the "non-celebrity" part of your comment -- the timeline contains (pointers to) posts from whoever someone follows, and doesn't care who those people are.)

    • Perhaps I misunderstanding, I thought the actual content of each tweet was being duplicated to every single timeline who followed the author, which sounded extremely wasteful, especially in the case of someone who has 200 million followers.

      1 reply →