Comment by russellthehippo

4 days ago

Fantastic comment, thanks for jumping in. I have well-tuned replies here lol given we're working on exactly the same problems!

First: your prefetch solution is precisely what I have in the roadmap for the next level of optimization: B-tree introspection at the page-child level not the table level. I haven't launched it yet as I only realized the potential in the past couple days and I wanted to focus on stability before showing this to folks. I am 100% going to try to use sqlite-prefetch approach in that experiment. I'm curious what kind of results you're seeing.

On write amplification: you're right, re-uploading an entire page group when one page is dirty feels inefficient. This tradeoff is intentional because turbolite believes all PUTs are equal, and it is aimed at optimizing for queries on cold databases with bursty reads, not write-heavy workloads (though writes work too). Checkpoints are ideally infrequent due to upload latency, and the page group sizes are tunable (default 256 64KB pages = 16MB uncompressed, a few MB compressed to S3). For the use case I'm targeting, the read locality wins dominate. That being said, turbolite does let the user choose when to checkpoint locally (durable on disk) vs to S3 (atomic commit in case of disk crash or new compute).

On LTX temporal locality vs page-group spatial locality: agreed, they're different design goals. LTX optimizes for transaction-aware features like PITR. turbolite optimizes for cold query speed from object storage. You could imagine a system that does both (WAL shipping for durability + grouped pages for reads), which is roughly where my roadmap goes. In fact, WAL + immutable page groups on checkpoints gives a much faster restore time than Litestream's snapshots+WAL if only because uploading the entire snapshot on each commit is slow. I'm starting to explore embedded WAL shipping in my walrust project https://github.com/russellromney/walrust.

On frontrun vs reactive: I have very specific benchmarks for this I think you will be intrested in. The tiered-bench binary has a --plan-aware flag that toggles between prefetch schedule (reactive, similar in spirit to your sibling detection) and frontrun (EQP-based). Look at the benchmark/README.md for detail. On 100K rows, local to Tigris, EQP is:

- who-liked (one-to-many JOIN): 4.4x faster cold, 1.4x with interior cached

- scan + filter: 2.9x faster cold

- indexed filter: 2.9x faster cold

- point lookup + JOIN: 1.8x cold

- simple queries (mutual friends): 1.3x, less to gain

The wins are largest on cold index lookups and SCANs where reactive prefetch has to discover the scan through sequential misses. Frontrun knows the full plan upfront and fires everything in parallel. For single-table scans, reactive catches up quickly after 1-2 cache misses, so the gap is smaller.

Finally - I included a query tuner CLI in the repo that lets you compare different prefetch aggression levels for a given query on given data. You may be interested in using this.