Comment by edinetdb

4 days ago

The "mostly-cold databases" framing maps well to something I ran into with financial data APIs: most listed companies are queried rarely — the long tail of small-caps sees < 1 req/day, while the top ~200 companies take the majority of traffic.

We ended up going in a different direction: BigQuery as canonical store with Gold-layer summary tables (~26MB) loaded into memory on startup and refreshed every 30 minutes. The in-memory path keeps p99 under 100ms for the hot tier, and BQ handles the cold tail without S3 round-trips. Simple to reason about, though it only works because the Gold tables are small enough to fit in memory.

Turbolite feels like it targets a different point in the design space — where the data is too large to cache fully but still read-heavy. The write amplification trade-off seems acceptable for append-only daily ETL workloads where you're writing once and reading many times. Is the target workload truly immutable-after-creation, or do you expect point updates?