Comment by drchaim

6 days ago

Sounds interesting, just some questions: - tables are partitioned? By year/month? - how do you handle too many small parquet files? - are updated/deleted allowed/planned?

1 comment

drchaim

ignaciovdk 6 days ago

Great questions, thanks! Partitioning: yes, Arc partitions by measurement > year > month > day > hour. This structure makes time-range queries very fast and simplifies retention policies (you can drop by hour/day instead of re-clustering).

Small Parquet files: we batch writes by measurement before flushing, typically every 10 K records or 60 seconds. That keeps file counts manageable while maintaining near-real-time visibility. Compaction jobs (optional) can later merge smaller Parquet files for long-term optimization.

Updates/deletes: today Arc is append-only (like most time-series systems). Updates/deletes are planned via “rewrite on retention”, meaning you’ll be able to apply corrections or retention windows by rewriting affected partitions.

The current focus is on predictable write throughput and analytical query performance, but schema evolution and partial rewrites are definitely on the roadmap.