← Back to context

Comment by amarant

17 hours ago

Oh wow, that's a super simple solution, and I can immediately see how this gets you the best of both worlds!

And since it's only used for speedy lookup we can even use a fast, cheap and non-secure hashing algorithm, so it's really a low-cost operation!

Thanks! This was really one of those aha-moments where I feel kinda stupid to not have thought of it myself!

I've also written about sharding.

https://planetscale.com/blog/database-sharding

  • Thanks! Another great article! It strikes me that modulo sharding on a sequential id would probably work rather well, but it was not mentioned in this article. Is there a reason I'm not seeing that this is bad? I guess resharding might be problematic, as you can't easily split a shard in two without rewriting every shard if you do that...

    • > I guess resharding might be problematic

      yes, that's the crux of the problem. when you have a sharded database, typically you want to be able to add (and/or remove) shards easily and non-disruptively.

      for example - your database is currently sharded across N nodes, and it's overloaded due to increased traffic, so you want to increase it to N+1 nodes (or N+M nodes, which can add complexity in some cases)

      if adding a shard causes a significant increase in load on the database, that's usually a non-starter for a production workload, because at the time you want to do it, the database is already overloaded

      you can read about this in the original Dynamo paper [0] from almost 20 years ago - consistent hashing is used to select 3 of the N nodes to host a given key. when node N+1 is added, it joins the cluster in such a way that it will "take over" hosting 1/Nth of the data, from each of the N nodes - meaning that a) the joining process places a relatively small load on each of those N nodes and b) once the node is fully joined, it reduces overall load evenly across all N nodes.

      0: https://www.allthingsdistributed.com/2007/10/amazons_dynamo....