Comment by gopalv

9 years ago

> Some systems (like Cassandra) has upserts and can do writes without reading.

That forces the latest-version resolution into a read-side problem - systems like that cannot handle any sort of complete table-scans but can only really work for a pre-known key.

With a known unique key, this is somewhat possible to use systems like this, but extremely expensive when what you need is BETWEEN '2017-01-01' and '2017-01-02'.

3 comments

gopalv

doikor 9 years ago

We have a lot of data like that and just use a compound partition key of (year, month, day) tuple and then we can ask stuff on day resolution. We also add exact timestamp into clustering key so you can only ask for more precise data.

With Cassandra (well most nosql dbs that I've run into) you model your data in the db by the queries. So if you want to scan by date then you create a new table (or a materialized view) with the date as the key you query for. Though even for then you have to keep in mind the amount of data to make sure you don't have too much data per partition and hotspots.

jjirsa 9 years ago

Cassandra has 2 components to its primary key - 1 that sets the partition (which includes which nodes get the data), and 1 that sets the clustering/sorting/ordering within that partition.

If you use a clustering key, scans become trivial within a given partition.

gegtik 9 years ago

depends on your structure. you could scan though dates as a partition's cluster key for example