Comment by ignaciovdk
6 days ago
Great questions, thanks! Partitioning: yes, Arc partitions by measurement > year > month > day > hour. This structure makes time-range queries very fast and simplifies retention policies (you can drop by hour/day instead of re-clustering).
Small Parquet files: we batch writes by measurement before flushing, typically every 10 K records or 60 seconds. That keeps file counts manageable while maintaining near-real-time visibility. Compaction jobs (optional) can later merge smaller Parquet files for long-term optimization.
Updates/deletes: today Arc is append-only (like most time-series systems). Updates/deletes are planned via “rewrite on retention”, meaning you’ll be able to apply corrections or retention windows by rewriting affected partitions.
The current focus is on predictable write throughput and analytical query performance, but schema evolution and partial rewrites are definitely on the roadmap.
No comments yet
Contribute on Hacker News ↗