Comment by erik_seaberg

3 years ago

S3 is a key-value store where object keys might contain slashes, but the implied directories don’t really exist. This is a problem for Spark and Hadoop jobs that expect to rename a large temp dir to signal that a stage’s output has been committed, because HDFS can do that atomically but S3 requires renaming objects one by one. IAM security policies also apply to keys or prefixes (renaming an object might change someone’s access level) and changes are cached for tens of minutes.

S3 didn’t used to be strongly consistent, though surprisingly they delivered https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-s3... which I hope they’re proud of.

Some people have been crazy enough to store tables of padded data in the keys of a lot of zero-length objects (which they do charge for) and use ListObjects for paginated prefix queries. It doesn’t much matter whether keys have slashes or commas or what.