Comment by cmcarthur

4 days ago

This is my understanding too, and this is particularly problematic for workloads that are read/write heavy on very recent data. When partitioning by a date or by an auto-incrementing id, you still run into the same issue.

Ex: your prefix is /id=12345. S3, under the hood, generates partitions named `/id=` and `/id=1`. Now, your id rolls over to `/id=20000`. All read/write activity on `/id=2xxxx` falls back to the original partition. Now, on rollover, you end up with read contention.

For any high-throughput workloads with unevenly distributed reads, you are best off using some element of randomness, or some evenly distributed partition key, at the root of your path.

0 comments

cmcarthur

No comments yet

Contribute on Hacker News ↗