← Back to context

Comment by rohan_

6 days ago

Was the key unlock here the ability to append data to an object?

(https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3...)

There are a few things unlocked by Ursa:

1. It is leaderless by design. So there is no single lead broker you need to route the traffic. So you can eliminate majority of the inter-zone traffic.

2. It is lakehouse-native by design. It is not only just use object storage as the storage layer, but also use open table formats for storing data. So streaming data can be made available in open table formats (Iceberg or Delta) after ingestion. One example is the integration with S3 Tables: https://aws.amazon.com/blogs/storage/seamless-streaming-to-a... This would simplify the Kafka-to-Iceberg integration.

Having built a prototype of a system like Ursa myself, this isn't something that you need to use at all, especially because it seems like this is only available in S3 Express One Zone.

  • Ursa is available across all major cloud providers (GCP, Azure, AWS). It also supports pluggable write ahead log storage. For latency relaxed workloads, we use object storage to get the cost down. So it works with AWS S3, GCP GCS, Azure Blob Store. For latency sensitive workloads, we use Apache BookKeeper which is a low-latency replicated log storage. This allows us to support workloads ranging from milliseconds to sub-seconds. You can tune it based on latency and cost requirements.

That’s probably not as useful as you think. Unless things have changed more recently, you need to set the offset from which to append, which makes it near useless for most use cases where appending would actually be useful.