Comment by pupdogg

5 years ago

You put in-place a loss mitigation strategy. This strategy will vary by application. In my case, I have a similar setup where we write 25-30k records to SQLite daily. We start each day fresh with a new SQLite db file (named yyyy-mm-dd.db) and back it up to AWS S3 daily under the scheme /app_name/data/year/month/file. You could say that's 9 million records a year or 365 mini-sqlite dbs containing 25-30k records. Portability is another awesome trait of SQLite. Then, at the end of the week (after 7 days that is), we use AWS Glue (PySpark specifically) to process these weekly database files and create a Parquet (snappy compression) file which is then imported into Clickhouse for analytics and reporting.

At any given point in time, we retain 7 years worth of files in S3. That's approx. 2275 files for under $10/month. Anything older, is archived into AWS Glacier...all while the data is still accessible within Clickhouse. As of right now, we have 12 years worth of data. Hope it helps!