Comment by 3eb7988a1663
1 day ago
While I suspect DuckDB would compress better, given the ubiquity of SQLite, it seems a fine standard choice.
1 day ago
While I suspect DuckDB would compress better, given the ubiquity of SQLite, it seems a fine standard choice.
the data is dominated by big unique TEXT columns, unsure how that can much compress better when grouped - but would be interesting to know
I was thinking more the numeric columns which have pre-built compression mechanisms to handle incrementing columns or long runs of identical values. For sure less total data than the text, but my prior is that the two should perform equivalently on the text, so the better compression on numbers should let duckdb pull ahead.
I had to run a test for myself, and using sqlite2duckdb (no research, first search hit), and using randomly picked shard 1636, the sqlite.gz was 4.9MB, but the duckdb.gz was 3.7MB.
The uncompressed sizes favor sqlite, which does not make sense to me, so not sure if duckdb keeps around more statistics information. Uncompressed sqlite 12.9MB, duckdb 15.5MB