← Back to context

Comment by nitwit005

3 years ago

Even if it is bypassing the file system, S3 is itself essentially a file system. It has all the usual features of paths, permissions, and so on. I assume it can't completely escape the same issues.

S3 is a key-value store where object keys might contain slashes, but the implied directories don’t really exist. This is a problem for Spark and Hadoop jobs that expect to rename a large temp dir to signal that a stage’s output has been committed, because HDFS can do that atomically but S3 requires renaming objects one by one. IAM security policies also apply to keys or prefixes (renaming an object might change someone’s access level) and changes are cached for tens of minutes.

S3 didn’t used to be strongly consistent, though surprisingly they delivered https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-s3... which I hope they’re proud of.

Some people have been crazy enough to store tables of padded data in the keys of a lot of zero-length objects (which they do charge for) and use ListObjects for paginated prefix queries. It doesn’t much matter whether keys have slashes or commas or what.