When imperfect systems are good: Bluesky's lossy timelines

5 months ago (jazco.dev)

314 comments

cyndunlop

I wonder why timelines aren't implemented as a hybrid gather-scatter choosing strategy depending on account popularity (a combination of fan-out to followers and a lazy fetch of popular followed accounts when follower's timeline is served).

When you have a celebrity account, instead of fanning out every message to millions of followers' timelines, it would be cheaper to do nothing when the celebrity posts, and later when serving each follower's timeline, fetch the celebrity's posts and merge them into the timeline. When millions of followers do that, it will be cheap read-only fetch from a hot cache.

ericvolp12 5 months ago
This is probably what we'll end up with in the long-run. Things have been fast enough without it (aside from this issue) but there's a lot of low-hanging fruit for Timelines architecture updates. We're spread pretty thin from a engineering-hours standpoint atm so there's a lot of intense prioritization going on.
- Xunjin 5 months ago
  
  Just to be clear, you are a Bluesky engineer, right?
  off-topic: how has been dealing with the influx of new users after X political/legals problems aftermath? Did you see an increase in toxicity around the network? And how has you (Bluesky moderation) dealing with it.
  
  20 replies →
- petra 5 months ago
  
  Maybe this would be helpful:http://daslab.seas.harvard.edu/datacalculator/
- curious_cat_163 5 months ago
  
  That's insightful. Keep up the good work!
VWWHFSfQ 5 months ago
At some point they'll end up just doing the Bieber rack [1]. It's when a shard becomes so hot that it just has to be its own thing entirely.
[1] - https://www.themarysue.com/twitter-justin-bieber-servers/
@bluesky devs, don't feel ashamed for doing this. It's exactly how to scale these kinds of extreme cases.
- genewitch 5 months ago
  
  I've stood up machines for this before I did not know they had a name, and I worked at the mouse company and my parking spot was two over from a J. Beibe'rs spot.
  So now we have Slashdot effect, HN hug, and its not Clarkson its... Stephen Fry effect? Maybe can be Cross-Discipline - there's a term for when lots of UK turns their kettles on at the same time.
  I should make a blog post to record all the ones I can remember.
  
  1 reply →
- bitbckt 5 months ago
  
  We never actually had a literal “Bieber Box”, but the joke took off.
  Hot shards were definitely an issue, though.
- stavros 5 months ago
  
  Given that BlueSky is funded by Twitter, I'm assuming they know a lot more than us on how Twitter architects systems.
- Imustaskforhelp 5 months ago
  
  Its so crazy.
  Thanks a lot for sharing this link.
rubslopes 5 months ago
This problem is discussed in the beginning of the Designing Data-Intensive Applications book. It's worth a read!
- Brystephor 5 months ago
  
  Do you know the name of the problem or strategy used for solving the problem? I'd be interested in looking it up!
  I own DDIA but after a few chapters of how database work behind the scenes, I begin to fall asleep. I have trouble understanding how to apply the knowledge to my work but this seems like a useful thing with a more clear application.
  
  1 reply →
rsynnott 5 months ago
> and later when serving each follower's timeline, fetch the celebrity's posts and merge them into the timeline
I think then you still have the 'weird user who follows hundreds of thousands of people' problem, just at read time instead of write time. It's unclear that this is _better_, though, yeah, caching might help. But if you follow every celeb on Bluesky (and I guarantee you this user exists) you'd be looking at fetching and merging _thousands_ of timelines (again, I suppose you could just throw up your hands and say "not doing that", and just skip most or all of the celebs for problem users).
Given the nature of the service, making read predictably cheap and writes potentially expensive (which seems to be the way they've gone) seems like a defensible practice.
- fc417fc802 5 months ago
  
  > I suppose you could just throw up your hands and say "not doing that", and just skip most or all of the celebs for problem users
  Random sampling? It's not as though the user needs thousands of posts returned for a single fetch. Scrolling down and seeing some stuff that's not in chronological order seems like an acceptable tradeoff.
- christkv 5 months ago
  
  You might mix the approaches based on some cut off point
locusofself 5 months ago
Why do they "insert" even non-celebrity posts into each follower's timeline? That is not intuitive to me.
- giovannibonetti 5 months ago
  
  To serve a user timeline in single-digit milliseconds, it is not practical for a data store to load each item in a different place. Even with an index, the index itself can be contiguous in disk, but the payload is scattered all over the place if you keep it in a single large table.
  Instead, you can drastically speed up performance if you are able to store data for each timeline somewhat contiguously on disk.
- wlonkly 5 months ago
  
  Think of it as pre-rendering. Of pre-rendering and JIT collecting, pre-rendering means more work but it's async, and it means the timeline is ready whenever a user requests it, to give a fast user experience.
  (Although I don't understand the "non-celebrity" part of your comment -- the timeline contains (pointers to) posts from whoever someone follows, and doesn't care who those people are.)
  
  2 replies →

ChuckMcM 5 months ago

As a systems enthusiast I enjoy articles like this. It is really easy to get into the mindset of "this must be perfect".

In the Blekko search engine back end we built an index that was 'eventually consistent' which allowed updates to the index to be propagated to the user facing index more quickly, at the expense that two users doing the exact same query would get slightly different results. If they kept doing those same queries they would eventually get the exact same results.

Systems like this bring in a lot of control systems theory because they have the potential to oscillate if there is positive feedback (and in search engines that positive feedback comes from the ranker which is looking at which link you clicked and giving it a higher weight) and it is important that they not go crazy. Some of the most interesting, and most subtle, algorithm work was done keeping that system "critically damped" so that it would converge quickly.

Reading this description of how user's timelines are sharded and the same sorts of feedback loops (in this case 'likes' or 'reposts') sounds like a pretty interesting problem space to explore.

snailmailman 5 months ago
I guess I hadn’t considered that search engines could be reranking pages on the fly as I click them. I’ve been seeing my DuckDuckGo results shuffle around for a while now thinking it’s an awful bug.
Like I click one page, don’t find what I want, and go back thinking “no, I want that other result that was below” and it’s an entirely different page with shuffled results, missing the one that I think might have been good.
- PaulHoule 5 months ago
  
  That's connected with a basic usability complaint about current web interfaces, that ads and recommended content aren't stable. You very well might want to engage with an ad after you are done engaging what you wanted to engage with but you might never see it again. Similarly, you might see two or three videos that you want to click on on the side of a YouTube video you're watching but you can only click on one (though if you are thinking ahead you can open these in another tab.)
  On top of that immediate frustration, the YouTube style interface here
  https://marvelpresentssalo.com/wp-content/uploads/2015/09/id...
  collects terrible data for recommendations because, even though it gives them information that you liked the thumbnail for a video, they can't come to any conclusion about whether or not you liked any of the other videos. TikTok, by focusing on one video at a time, collects much better information.
  
  1 reply →
- cgriswald 5 months ago
  
  I don't use DDG, but in my (very limited, just now) testing it doesn't seem to shuffle results unless you reload the page in some way. Is it possible you're browser is reloading the page when you go back? If so, setting DDG to open links in new tabs might fix this problem.
  
  1 reply →
- numeri 5 months ago
  
  This behavior started happening for me in the last few months. If I click on a result, then go back, I have different search results.
  I've found a workaround, though – click back into the DDG search box at the top of the page and hit enter. This then returns the original search results.
- gtfiorentino 5 months ago
  
  Hi - I work on search at DuckDuckGo. Do you mind sharing a bit more detail about this issue? What steps would allow us to reproduce what you're seeing?
gopher_space 5 months ago

> Some of the most interesting, and most subtle, algorithm work was done keeping that system "critically damped" so that it would converge quickly.
Looking back at my early work with microservices I'm wondering how much time I would have saved by just manually setting a tongue weight.
dwedge 5 months ago

Similar to how Google images loads lower quality blurred thumbnails towards the bottom of the window at first so that the user thinks they loaded faster
aqueueaqueue 5 months ago

This is less a question of perfection and one of trade off's. Laws of physics put a limit on how efficiently you can keep data in NYC and London in perfect sync, so you choose CAP-style trade-offs. There are also $/SLO trade-offs. Each 9 costs more money.
I like your example it is very interesting. If I get to work on (or even hear someone in my team is working on) such interesting problems and I can hear about it, I get happy.
Interesting problems are rare because like a house you might talk about brick vs. Timber frame once, but you'll talk about cleaning the house every week!
gregw134 5 months ago

Would you be willing to share more about how you guys did click ranking at Blekko? It's an interesting problem.
culi 5 months ago
What became of Blekko?
- an_ko 5 months ago
  
  > It was acquired by IBM in March 2015, and the service was discontinued.
  — https://en.wikipedia.org/wiki/Blekko
  Perhaps GP has a more interesting answer though.
  
  9 replies →
genewitch 5 months ago

PID techniques useful?

rakoo 5 months ago

Ok I'm curious: since this strategy sacrifices consistency, has anyone thoughts about something that is not full fan-out on reads or on writes ?

Let's imagine something like this: instead of writing to every user's timeline, it is written once for each shard containing at least one follower. This caps the fan-out at write time to hundreds of shards. At read time, getting the content for a given users reads that hot slice and filters actual followers. It definitely has more load but

- the read is still colocated inside the shard, so latency remains low

- for mega-followers the page will not see older entries anyway

There are of course other considerations, but I'm curious about what the load for something like that would look like (and I don't have the data nor infrastructure to test it)

dsauerbrun 5 months ago

I'm a bit confused.

The lossy timeline solution basically means you skip updating the feed for some people who are above the number of reasonable followers. I get that

Seeing them get 96% improvements is insane, does that mean they have a ton of users following an unreasonable number of people or do they just have a very low number for reasonable followers. I doubt it's the latter since that would mean a lot of people would be missing updates.

How is it possible to get such massive improvements when you're only skipping a presumably small % of people per new post?

EDIT: nvm, I rethought about it, the issue is that a single user with millions of follows will constantly be written to which will slow down the fanout service when a celebrity makes a post since you're going through many db pages.

friendzis 5 months ago

When a system gets "overloaded", typically it enters exponential degradation of performance state, i.e. performs self ddos.
> Seeing them get 96% improvements is insane
TFA is talking about P99 tail latencies. It does not sound too insane to reduce tail latencies by extraordinary margins. Remember, it's just reshaping of latency distribution. In this case pathological cases get dropped.
Beretta_Vexee 5 months ago
> does that mean they have a ton of users following an unreasonable number of people
Look at the accounts of OnlyFans models, crypto influencers, etc. They follow thousands or even tens of thousands of accounts in the hope that we will follow them in return.
- mapt 5 months ago
  
  I don't see that accommodating this behavior is prosocial or technically desirable.
  Can you think of a use case?
  All sorts of bots want this sort of access, but whether there are legitimate reasons to grant it to them on a non-sharded basis is another question since a lot of these queries do not scale resources with O(n) even on a centralized server architecture.
  
  2 replies →
aloha2436 5 months ago

> does that mean they have a ton of users following an unreasonable number of people
They do, there are groups of users on bluesky who follow inordinate numbers of other accounts to try and get follows back.
citrus1330 5 months ago

They were specifically looking at worst-case performance. P99 means 99th percentile, so they saw 96% improvement on the longest 1% of jobs.

spoaceman7777 5 months ago

Hmm. Twitter/X appears to do this at quite a low number, as the "Following" tab is incredibly lossy (some users are permanently missing) at only 1,200 followed people.

It's insanely frustrating.

Hopefully you're adjusting the lossy-ness weighting and cut-off by whether a user is active at any particular time? Because, otherwise, applying this rule, if the cap is set too low, is a very bad UX in my experience x_x

VWWHFSfQ 5 months ago
> It's _insanely_ frustrating.
> at only 1,200 followed people.
I follow like, 50 people on bluesky. Who is following 1,200 people? What kind of value do you even get out of your feed?
- peoplepostphew 5 months ago
  
  1200 people is really nothing, specially if you have a job tangentially related to social media (for example journalists). It's really simple, you are not the same type of user. You have 50 "acquaintances", they have 1200 "sources".
  The article is talking about people who have following/follower counts in the millions. Those are dozens of writes per second in one feed and a fannout of potentially millions. Someone with 1200 followers, if everyone actually posts once a day (most people do not) gets... a rate of 0.138 writes per second.
  They should be background noise, irrelevant to the discussion. That level of work is within reasonable expectation. What they're pointing out is that Twitter is aggressively anti-perfectionist for no good technical reason - so there must be a business reason for it.
  
  6 replies →
- throw10920 5 months ago
  
  I can come up with 100 people I'd want to follow on Twitter, and I don't even have an account. Don't dismiss other people's use-cases if you don't have or understand them.

rconti 5 months ago

> Additionally, beyond this point, it is reasonable for us to not necessarily have a perfect chronology of everything posted by the many thousands of users they follow, but provide enough content that the Timeline always has something new.

While I'm fine with the solution, the wording of this sentence led me to believe that the solution was going to be imperfect chronology, not dropped posts in your feed.

jadbox 5 months ago

So, let's say I follow 4k people in the example and have a 50% drop rate. It seems a bit weird that if all (4k - 1) accounts I follow end up posting nothing in a day, that I STILL have a 50% chance that I won't see the 1 account that posts in a day. It seems to me that the algorithm should consider my feed's age (or the post freshness of my followers). Am I overthinking?

imrehg 5 months ago

This feels like an edge case.
The "reasonable limit" is likely set based on experimentation, and thus on how much people post on average and the load it generates (so the real number is unlikely to be exactly "2000", IMHO).
If you follow a lot of people, how likely it is that their posting pattern is so different from the average? The more people you follow, the less likely that is.
So while you can end up in such situation in theory, it would need to be a very unusual (and rare) case.
brianolson 5 months ago

I think the 'law of large numbers' says that it's very unlikely for you to follow 4k and have _none_ of them posting. You could artificially construct a counter-example by finding 4k open but silent accounts, but that's silly.
The other workaround is: follow everyone. Write some code to get what you want out of the jetstream event feed. https://docs.bsky.app/blog/jetstream
kevincox 5 months ago

Yeah, this seems concerning to me. Maybe now as the platform is new this isn't much of an issue. But as accounts go inactive people will naturally collect "dead" accounts that they are still following. On Facebook it isn't uncommon of to have old accounts of sociable people naturally collect thousands of friends.
It seems that what they are trying to measure is "busy timelines" and it seems bag they could probably measure that more directly. For example what is the number of posts in the timeline over theast 24h? It seems that it should be fairly easy to use this as the metric for calculating drop rate.

knallfrosch 5 months ago

Anyone following hundreds of thousands of users is obviously a bot account scraping content. I'd ban them and call it a day.

However, I do love reading about the technical challenge. I think Twitter has a special architecture for celebrities with millions of followers. Given Bluesky is a quasi-clone, I wonder why they did not follow in these footsteps.

psionides 5 months ago
You don't need to follow anyone (or even have an account) to scrape content… Someone following a huge amount of accounts usually wants to get a lot of followers quickly this way through follow-backs.
- mikemitchelldev 5 months ago
  
  Yes, and Starter Packs make this possible.
steveklabnik 5 months ago

> Given Bluesky is a quasi-clone, I wonder why they did not follow in these footsteps.
There are only six users with over a million followers, and none with two million yet.
I'm sure they'll get there.
culi 5 months ago

Maybe not hundreds of thousands but I'd follow anybody that looks remotely interesting and then primarily use customized feeds. E.g. if I wanna hear about union news, my personal irl network, etc I check that feed
ruined 5 months ago
if you want to scrape all the content, that's what the firehose is for, and it's allowed.
the only reason to mass-follow is for spam purposes.
- Retr0id 5 months ago
  
  This does assume that scrapers are smart, and often they're really not. They have infrastructure for scraping HTML from webpages at scale and that is the hammer they use for all nails. (e.g. Wikipedia has to fight off scraper traffic despite full archives being available as torrents, etc.)
  In this case I agree though, they're all spammers and/or "clout farmers", or trying to make an account seem more authentic for future scams. They want to generate follow notifications in the hope that some will follow them back (and if they don't, they unfollow again after some interval).
  
  1 reply →
mikemitchelldev 5 months ago

BlueSky has starter packs that allow you to mass follow in the click of a button. You join 10 starter packs in one day, you are following over 1000 people. Sometimes following others is the only way to get people to engage with your content.
tshaddox 5 months ago
Or just enforce a maximum number of followed accounts.
- ARandumGuy 5 months ago
  
  No matter how high you set a maximum limit for interactions on social media (followers, friends, posts, etc), someone will reach the limit and complain about it. I can see why Bluesky would prefer a "soft limit", where going above the limit will degrade the experience. It gives more flexibility to adjust things later, and prevents obnoxious complaints from power users with outsized influence.
  
  2 replies →

sphars 5 months ago

When I go directly to a user's profile and see all their posts, sometimes one of their posts isn't in my timeline where it should be. I follow less than 100 users on Bluesky, but I guess this explains why I occasionally don't see a user's post in my timeline.

Lossy indeed.

Retr0id 5 months ago
If another user you follow reposted or replied to a post, it can affect its order in your following feed. You shouldn't be seeing any loss as described in the article from following only 100 users.
- sphars 5 months ago
  
  I've experienced it with "first-party" posts, not replies. A post wouldn't show in my timeline but would on the user's profile. This is the official android app, but there has been an update or two so I'll have to double check again
Eric_WVGG 5 months ago
Are you using an app, website, or combination?
Various clients (I’m writing one) interpret the timeline differently, as a feed that shows literally everything includes could things that most people would find undesirable or irrelevant. (replies to strangers, replies to replies to replies, etc)
- sphars 5 months ago
  
  I'm using the official android app. There has been an update or two so I'll have to confirm it's still happening

cavisne 5 months ago

AWS has a cool general approach to this problem (one badly behaving user effecting others on their shard)

https://aws.amazon.com/builders-library/workload-isolation-u...

The basic idea is to assign each user to multiple shards, decreasing the changes of another user sharing all their shards with the badly behaving user.

Fixing this issue as described in the article makes sense, but if they did shuffle sharding in the first place it would cover any new issues without effecting many other users.

artee_49 5 months ago

I think shuffle sharding is beneficial for read-only replica cases, not for writing scenarios like this. You'll have to write to the primary and not to a "virtual node". Right? Or am I understand it incorrectly? I just read that article now.

ultra-boss 5 months ago

Love reading these sorts of "technical problem + solution" pieces. The world does not need more content, in general, but it does need more of this kind of quality information sharing.

artee_49 5 months ago

I am a bit perplexed though as to why they have implemented fan-out in a way that each "page" is blocking fetching further pages, they would not have been affected by the high tail latencies if they had not done this,

"In the case of timelines, each “page” of followers is 10,000 users large and each “page” must be fanned out before we fetch the next page. This means that our slowest writes will hold up the fetching and Fanout of the next page."

Basically means that they block on each page, process all the items on the page, and then move on to the next page. Why wouldn't you rather decouple page fetcher and the processing of the pages?

A page fetching activity should be able to continuously keep fetching further set of followers one after another and should not wait for each of the items in the page to be updated to continue.

Something that comes to mind would be to have a fetcher component that fetches pages, stores each page in S3 and publishes the metadata (content) and the S3 location to a queue (SQS) that can be consumed by timeline publishers which can scale independently based on load. You can control the concurrency in this system much better, and you could also partition based on the shards with another system like Kafka by utilizing the shards as keys in the queue to even "slow down" the work without having to effectively drop tweets from timelines (timelines are eventually consistent regardless).

I feel like I'm missing something and there's a valid reason to do it this way.

abound 5 months ago

I interpreted this as a batch write, e.g. "write these 10k entries and then come back". The benefit of that is way less overhead versus 10k concurrent background routines each writing individual rows to the DB. The downside is, as you've noted, that you can't "stream" new writes in as older ones finish.
There's a tradeoff here between batch size and concurrency, but perhaps they've already benchmarked it and "single-threaded" batches of 10k writes performed best.

ramblejam 5 months ago

Nice problem to have, though. Over on Nostr they're finding it a real struggle to get to the point where you're confident you won't miss replies to your own notes, let alone replies from other people in threads you haven't interacted with.

The current solution is for everyone to use the same few relays, which is basically a polite nod to Bluesky's architecture. The long-term solution is—well it involves a lot of relay hint dropping and a reliance on Japanese levels of acuity when it comes to picking up on hints (among clinets). But (a) it's proving extreme slow going and (b) it only aims to mitigate the "global as relates to me" problem.

arcastroe 5 months ago

I found it odd to base the loss-factor on the number of people you follow, rather than a truer indication of timeline-update-frequency. What if I follow 4k accounts, but each of those accounts only posts once a decade? My timeline would be become unnecessarily lossy.

NoGravitas 5 months ago

The funny thing is that all of the centralization in Bluesky is defended as being necessary to provide things like global search and all replies in a thread, things that Mastodon simply punts on in the name of decentralization. But then ultimately, Bluesky has to relax those goals after all.

ramblejam 5 months ago

True. In context though Bluesky can tweak the volume knob as and when they see fit, whereas for Mastodon it's stuck where it is.

skybrian 5 months ago

This design makes sense if you didn’t previously have any limit on the number of people an account could follow. But why not have a limit?

whyrusleeping 5 months ago

people get so up in arms when you suggest there might be a limit on how many people they can follow.

nasso_dev 5 months ago

Interesting! I wonder what value they chose for the `reasonable_limit`.

Retr0id 5 months ago

ought to be possible to reverse-engineer it by following a large number of active accounts and seeing what percentage of their posts actually hit your feed
nasso_dev 5 months ago

It's 4k: https://bsky.app/profile/jaz.bsky.social/post/3likncbqutk2y

inportb 5 months ago

An interesting solution to a challenging problem. Thank you for sharing it.

I must admit, I had some trouble following the author's transition from "celebrity" with many followers to "bot" with many follows. While I assume the work done for a celebrity to scatter a bunch of posts would be symmetric to the work done for a commensurate bot to gather a bunch of posts, I had the impression that the author was introducing an entirely different concept in "Lossy Timelines."

thmrtz 5 months ago

That’s quite interesting and a challenge I have not thought of. I understand the need for a solution and I believe this works reasonably well, but I am wondering what is happening to users that follow a lot of accounts with below-average activity. This may naturally happen on new social media platforms with people trying out the service and possibly abandoning it.

The „reasonable limit“ is likely set to account for such an effect, but I am wondering if a per-user limit based on the activity of the accounts one follows will be an improvement on this approach.

fastest963 5 months ago

To help avoid the hot shard problem, I wonder how capping followers per "timeline" would perform. Especially each user would have a separate timeline per 1000 followers and the client would merge them. You could still do the lossy part, if necessary, by only loading a percent of the actual timelines. That wouldn't help the celebrity problem but it was already acknowledged earlier that the solution to that is to not fan out celebrity accounts.

Artoooooor 5 months ago

Are users informed that they follow too many creators and now they will not see every post on their timelines?

buxidao 5 months ago

In the fanout design, why not dynamically move on to the next 10,000-user page as soon as all tasks for the current page are either queued or processing? Would that approach improve throughput, or could it introduce issues like resource contention?

trhway 5 months ago

So the system design puts the burden on what seems to be synchronous, not queued, writes to get easy reads. I usually prefer simpler cheaper writes at the cost of more complicated reads as the reads scale and parallelize better.

pfraze 5 months ago

you're underestimating the read load, by a lot

crabbone 5 months ago

Anecdotally, I ran into a similar solution "by chance".

Long ago, I worked for a dating site. Our CTO at the time was a "guest of honor" who was brought in by a family friend who was working in the marketing at the time. The CTO was a university professor who took on a job as a courtesy (he didn't need the money nor fame, he had enough of both, and actually liked teaching).

But he instituted a lot of experimental practices in the company. S.a. switching roles every now and then (anyone in the company could apply for a different role except administration and try themselves wearing a different hat), or having company-wide discussions of problems where employees would have to prepare a presentation on their current work (that was very unusual at the time, but the practice became more institutional in larger companies afterwards).

Once he announced a contest for the problem he was trying to solve. Since we were building a dating site, the obvious problem was matching. The problem was that the more properties there were to match on, the longer it would take (beside other problems that is). So, the program was punishing site users who took time to fill out the questionnaires as well as they could and favored the "slackers".

I didn't have any bright ideas on how to optimize the matching / search for matches. So, ironically, I asked "what if we just threw away properties beyond certain threshold randomly?" I was surprised that my idea received any traction at all. And the answer was along the lines of "that would definitely work, but I wouldn't know how to explain this behavior to the users". Which, at the time, I took to be yet another eccentricity of the old man... but hey, the idea stuck with me for a long time!

detuur 5 months ago

The answer to that reply is you don't need to explain it to your users. People are used to fuzzy/best-effort sort of matching, especially when it's specifically presented as a "matching algorithm" instead of a "filter".

flaburgan 5 months ago

The solution to this problem is known and implemented already: the social web should be distributed between thousands of pods which should contain at the maximum a few thousands users. Diaspora is already working like this for 15 years. It is technically harder to build initially but it then divide all the problems (maintenance, moderation, load, censorship, trust of the owner...) Which makes the network much more resilient. Bluesky knows that and they are allowing other people to host their software but they are really not pushing for it and it highly doubt that the experience of a user on a small external pod is the same than on bluesky.com.

grishka 5 months ago

This particular problem will still exist for a fediverse server. You follow 10k people? Nice, now you're getting ddos'd by their activities. Though, most fediverse servers being monolithic applications definitely helps.

mpweiher 5 months ago

On a related note, I am pretty confident that one of the main reasons the WWW succeeded where previous attempts failed was that it very specifically allowed 404s.

KolmogorovComp 5 months ago

A simpler option is to put a limit on the number of accounts one’s can follow. Who needs to follow more than 4k followers if not bots?

udioron 5 months ago

> some of them will do abnormal things like… well… following hundreds of thousands of other users.

Sounds like Bluesky Pro.

yibg 5 months ago

I think something like this was a FB engineering interview (several years ago), just for instagram feeds.

JadeNB 5 months ago

I understand that it's a different point, but how can someone write a whole essay called "When imperfect systems are good" without once mentioning Gabriel or https://en.wikipedia.org/wiki/Worse_is_better?

robbale 5 months ago

the use of fan-out to followers and a lazy fetch of popular followed accounts when follower's timeline is served a good implementations in hot reload scenarios

dtonon 5 months ago

The typical problem of a centralized infrastructure.

Indeed:

> This means each user gets their own Timeline partition, randomly distributed among shards of our horizontally scalable database (ScyllaDB), replicated across multiple shards for high availability

Nemo_bis 5 months ago

"Lossy timelines" have already been implemented in ActivityPub and Mastodon by design. Will Bluesky ever catch up? It remains to be seen.

andsoitis 5 months ago

Principle: Progress over perfection.

nightpool 5 months ago

Note that all of this reflects design decisions on Bluesky's closed-source "AppView" server—any federated servers interacting with Bluesky would need to construct their own timelines, and do not get the benefit of the work described here.

pfraze 5 months ago
As others have noted, the appview is open source. The dataplane has two implementations, one in postgres and another in scylla. The scylla dataplane is closed, the postgres one is open.
The interesting next stage for the postgres implementation is to create a sync engine for partial syncs of the network, so that an appview can run affordably. We ran some benches on the current state of the postgres implementation and found we could index 300k users on a $100/mo vps. I think with a couple of weeks of optimization that could reach 1mm users.
- nightpool 5 months ago
  
  This is great to hear—my current understanding of the most recent state of the art on the topic is https://alice.bsky.sh/post/3laega7icmi2q which mentions that the self-hosted appview is not yet open source. So I'm glad to hear the situation has changed in the past 3 months.
  
  1 reply →
haileyok 5 months ago
This is not true. Third party PDSes are fully supported by our app view, and our app view generates timelines for all the users on those PDSes.
- nightpool 5 months ago
  
  What does this have to do with third party app views?
  
  3 replies →
xrisk 5 months ago
What reason does Bluesky give for not opening up their AppView code?
Another notable component that is closed source is the discovery feed generator, where at least there is some reason.
- iameli 5 months ago
  
  I asked this and got
  > We did a backend rewrite from postgres to scylla and it has a bunch of deployment specific stuff, but is functionally identical to the open source postgres version. Its not really a "v2" in terms of new features, we just made it make use of our hardware really well[1]
  [1]: https://bsky.app/profile/iame.li/post/3l7e3jfqit22s
  
  2 replies →
- verdverm 5 months ago
  
  The App View frontend is open source: https://github.com/bluesky-social/social-app
  Much of the backend is open source as well: https://github.com/bluesky-social/atproto/tree/main/packages
  What is not are the extra services they run to provide a better and faster UX. Even if it was open source, it likely costs 10s of thousands to run per month (they have moved largely to "onprem" hardware instead of the cloud aiui)
  
  10 replies →
- dingnuts 5 months ago
  
  when I read the spec it seemed like the operator of an AppView & Relay would be most in need of compensation for their hosting costs due to the amount of demand on those components so I believe the spec allows an operator to implement their own AppView & monetize it as that operator sees fit, so that they can afford to operate the service and maybe even make money off of it so that they can make it their full time jobs.
  
  1 reply →
- muscomposter 5 months ago
  
  what else? profit by means of doing work that benefits first and foremost the private proprietors of the closed source
  if they gave it away (which used to be unfeasible until the digital era) they feel they’re loosing their valuable effort which they’re wont on concentrating, not diluting.
evbogue 5 months ago
My thinking has evolved on this topic significantly as of late. My current thinking is we should create a secure gossip network on top of the Bluesky API, and forgot about all the DAG-CBOR stuff that gets stripped from the Jetstream. Hash the posts on the gossip layer and if posts change then diff them. This is all prep for when X billionaire buys out Bluesky then we just pop some signing key crypto on top of this gossip layer and wow! It's distributed!
- pfraze 5 months ago
  
  isnt that ssb?
  
  1 reply →

timewizard 5 months ago

> This process involves looking up all of your followers, then inserting a new row into each of their Timeline tables in reverse chronological order with a reference to your post.

Seriously? Isn't this the nut of your problem right here?

jsnell 5 months ago
What alternative design did you have in mind, given that a Twitter-like data model of individual follows is likely a strict product requirement?
There are obviously other ways of doing it (doing the timeline propagation in a batch job, fanning out the reads rather than the writes), but they've got their own problems. Probably worse ones.
- pphysch 5 months ago
  
  Wouldn't a hybrid approach makes sense?
  Periodically classify users as hot/cold based on their activity, build hot-follower timelines on write, and build cold-follower timelines on read.
  
  1 reply →

PaulHoule 5 months ago

An airline reservation system has to be perfect (no slack in today's skies), a hotel reservation can be 98% perfect so long as there is some slack and you don't mind putting somebody up in a better room than they paid for from time to time.

A social media system doesn't need to be perfect at all. It was clear to me from the beginning that Bluesky's feeds aren't very fast, not like they are crazy slow, but if it saves money or effort it's no problem if notifications are delayed 30s.

darknavi 5 months ago
It's funny because from my experience airline systems are very imperfect (timing wise).
I (unwisely) tried to purchase an Icelandair ticket via the Chase travel portal. I would get a reservation number, go buy seats on Icelandair's website, and a few days later the entire reservation would vanish into the ether. Rinse and repeat 3x.
I can't remember the exact verbiage, but basically tickets can be "reserved" and "booked". One means the ticket is allocated, and one means the ticket is actually paid for. I eventually sat on the phone with an executive support person as they booked the ticket and got it all the way through. It turns out Chase reserves a ticket on an airline but as an SLA of ~3 days to actually pay for the ticket. Icelandair's requires a ticket to be paid with in 24 hours, so it was timing out.
- scarface_74 5 months ago
  
  (Replying to both you and the parent poster)
  Airlines are far from perfect. They overbook flights and sometimes have to ask people leave and pay them for the inconvenience. My wife and I once got $1000 a piece and a hotel and food voucher to volunteer to take a flight the next day on a layover in Atlanta.
  As far as your particular situation, the number one rule of using a third party portal to book flights or hotels is - don’t.
  I understand that Iceland Air is not a transfer partner of Chase. But even in that case, I would just wait to use my points until I could use a transfer partner.
  On the earning side if paying cash, the difference between 2x/3x points when booking directly and 5x when going through the portal just isn’t worth the risk.
  
  4 replies →
rconti 5 months ago

Especially for a free service!
Think about other ad-supported sites. If you're an engineer working on an ad-supported product, the perfect consistency you strive for in your code is not the product. The product is the sum of all of the content the user sees. And the costs of the tradeoffs you make are paid for by ads.
Am I willing to see 10x more ads for perfect consistency? Definitely not.
singleshot_ 5 months ago
Does the fact that an airline booking system must be perfect explain why so many flights are overbooked or cancelled?
- rconti 5 months ago
  
  No, overbooking is a business decision justified by the fact that, statistically, not all passengers will actually show up for their flight, and lower load factors cost money.
  
  28 replies →
nonrandomstring 5 months ago

> airline reservation system has to be perfect (no slack in today's skies)
The slack just gets moved. Airlines oversell by about 8 percent. All systems need some slack in them. Isn't that kinda Bob's Law or something?
gamedever 5 months ago
Miscommunication leads to bad outcomes. One missed message out of order could easily lead to a fight, a lawsuit, a flash mob, threats of violence - that then need to be taken seriously, swatting, DOXxing, etc...
Msg 1: I hate ___insert_controversal_person_category_here___
Msg 2: Is the kind of statement that really sets me off
Msg 1 has a very different meaning if you don't see Msg 2.
- pjc50 5 months ago
  
  This can already happen without help from the platform.
  
  1 reply →

bitmasher9 5 months ago

It’s really impressive how well Bluesky is performing. It really feels like a throwback to older social media platforms with its simplicity and lack of dark-patterns. I’m concerned that all the great work on the platform, protocol, etc won’t shine in the long term as they eventually need to find a revenue source.

culi 5 months ago

I love Mastodon but I have to admit that BlueSky has clearly out-engineered them. Of course they started with much more expertise and resources. I hope ActivityPub compatibility soon to unite the two
mullingitover 5 months ago

They've done an incredible job running with an extremely low headcount and crazy efficient use of hardware. It would be easy to 10x their expenses if they were blindly following the standard cloud deployment playbook. Hopefully this level of efficiency mean they don't have to work as hard and can stay pre-revenue, a pure play, for a very long time.
autobodie 5 months ago
Absolutely. The profit motive is the root of most evil. It is a shame that so many are trained to believe it is the only motive available.
- gkoberger 5 months ago
  
  I completely agree with this... but without profit, people can't get paid, and they'll stop building. I do hate this incredibly need for growth, of course, but financial growth is necessary to pay people and give them raises and allow them to have upward mobility at the company.
  I hope Bluesky is able to find a model that works for them AND for consumers. (I do know it's an open protocol, so it'll live on without Bluesky itself! However, as this post shows, it's a lot of work to build on the prototype... so if not them, who? And if someone else, how will they become sustainable?)
  
  20 replies →
- pessimizer 5 months ago
  
  Bluesky is a private for-profit company that has taken $37M in venture capital.
  https://www.piratewires.com/p/interview-with-jack-dorsey-mik...
  > That was the second moment I thought, uh, nope. This is literally repeating all the mistakes we made as a company. This is not a protocol that’s truly decentralized. It’s another app. It’s another app that’s just kind of following in Twitter’s footsteps, but for a different part of the population.
  > Everything we wanted around decentralization, everything we wanted in terms of an open source protocol, suddenly became a company with VCs and a board. That’s not what I wanted, that’s not what I intended to help create.
- jarjoura 5 months ago
  
  There's no reason Bluesky has emulate what FB Newsfeed and Twitter/X did to solve engagement by promoting certain items over others.
  At the very least, they do have hindsight to learn from.
  
  4 replies →

mifydev 5 months ago

"Hot Shards in Your Area" - 10/10 heading

dang 5 months ago

[stub for offtopicness]

amazingamazing 5 months ago
I don’t understand the infatuation with blue sky. The minute they need money it’ll go the way of the Reddit and twitter.
- Larrikin 5 months ago
  
  If everything good is assumed to eventually become bad, why not use things while they are good and then immediately move on when it becomes bad?
  
  2 replies →
- xrisk 5 months ago
  
  People want the old Twitter, and Bluesky is close to that. It also cosplays being decentralized to people who don’t look too closely.
  
  4 replies →
- VectorLock 5 months ago
  
  People seem to lark on and on about how it has better "default moderation" than Mastodon.
  
  18 replies →
- rsynnott 5 months ago
  
  Twitter was always... not great (there's a reason it was affectionately known as the Hellsite), but it had 16 years of being _tolerable_ for most people (the real exodus only really started with Musk's changes, though there had been a couple of smaller ones previously, mostly over Twitter messing with the API).
  Frankly, if I get 16 years out of Bluesky before having to move onto the next one, I can live with that. Social networks _die_; it has always been so. USENET, livejournal, Tumblr, twitter... nothing lasts forever.
- bowsamic 5 months ago
  
  My bluesky feed is somehow even more abhorrent than my twitter one, except that instead of right wing hate it's Facebook memes about "reading banned books"
  
  3 replies →
glerk 5 months ago
[flagged]
- ddejohn 5 months ago
  
  This is such a lazy, uninformed take that people just love to repeat. 1) the left on Bluesky is full of in-fighting because neolib left are convinced that Harris lost because of racism/sexism and the progressive left spend a lot of their time trying to educate (and dunk on) them for their braindead takes, and 2) any social media platform will become an echo chamber if you only choose to follow people that echo your sentiments. As long as Bluesky isn't actively censoring and suspending journalists and other public figures, there is no equivalence to Truthsocial or X and only a clown/shill/psyop would suggest as much.
  It's really not that hard to find enriching content from all walks of life on Bluesky -- if somebody can't find it, they just suck at the internet.
  To be clear, I do have grievances with Bluesky, and I do not have high hopes for its future -- but that's because I personally believe that social media in general is both fatally flawed from the start and detrimental to society, and will never not devolve into ad-riddled or otherwise enshittified services. I am not a Bluesky shill, I'm just here to call out the silly false equivalence with Truthsocial, etc.
  
  4 replies →
- hooverd 5 months ago
  
  Say what you will about Bluesky, but at least Jay isn't paling around with honest to god neo-nazis.
- perching_aix 5 months ago
  
  Wow that doesn't sound like a hyperbole at all.
- timeon 5 months ago
  
  You can add X to the truthsocial/gab group.
  
  1 reply →
Boogie_Man 5 months ago

[flagged]
zoul 5 months ago
I would be so much more interested in Bluesky if it were technically impossible for a random super rich guy to buy and bend it to his whims.
- culi 5 months ago
  
  Isn't that the whole point of bs? Empowering users to take their data where they want. It's completely open-sourced and well-documented. If someone buys bluesky you can move all your data to a different service that follows the same protocol
  
  4 replies →
exabrial 5 months ago

I honestly am annoyed to use websites and services like this. Annoys the crap out of me and everyone else, but since it's petty much forced down their throats, the "eventually" is "eventually everyone stops complaining".
einpoklum 5 months ago

Centrally-controlled social media platforms are not a good thing, period. Neither Twitter/X, nor BlueSky. Let's not fete them.

cush 5 months ago

"Hot Shards in Your Area"... I died

alexnewman 5 months ago

I don’t see much call for blusky anymore….

rsynnott 5 months ago
I mean: https://bskycharts.edavis.dev/static/dynazoom.html?cgiurl_gr...
Posts/sec are just off record levels.
- Vaslo 5 months ago
  
  So it’s bluesky this week? What about mastodon? What about threads? I thought X was going to die.
  I’ll just stay with X, that’s where everybody is, contrary to the bluesky cheerleaders here.
- Claudus 5 months ago
  
  Record seems to be ~115, while current is ~63, or 55%.
  
  1 reply →
alexnewman 5 months ago

Wow this got super downvoted for nothing??? It was honest are people seeing as much references in the media