Comment by whacked_new

6 days ago

Super interested in this as well (and thank you for Caddy)

How does this handle data updating / fixing? My use case is importing data that's semi structured. Say you get data from a 3rd party provider from one dump, and it's for an event called "jog". Then they update their data dump format so "jog" becomes subdivided into "light run" vs "intense walk", and they also applied it retroactively. In this case you'd have to reimport a load of overlapping data.

I saw the FAQ and it only talks about imports not strictly additive.

I am dealing with similar use cases of evolving data and don't want to deal with SQL updating, and end up working entirely in plain text. One advantage is that you can use git to enable time traveling (for a single user it still works reasonably).

5 comments

whacked_new

mholt 6 days ago

Glad you like Caddy!

> How does this handle data updating / fixing?

In the advanced import settings, you can customize what makes an item unique or a duplicate. You can also configure how to handle duplicates. By default, duplicates are skipped. But they can also be updated, and you can customize what gets updated and which of the two values to keep.

But yes, updates do run an UPDATE query, so they're irreversible. I explored schemas that were purely additive, so that you could traverse through mutations of the timeline, but this got messy real fast, and made exploring (reading) the timeline more complex/slow/error-prone. I do think it would be cool though, and I may still revisit that, because I think it could be quite beneficial.

whacked_new 6 days ago

Thanks for the reply! I'll have to try this out... it almost looks like what perkeep was meant to become.
One interesting scenario re time traveling is if we use an LLM somewhere in data derivation. Say there's a secondary processor of e.g. journal notes that yield one kind of feature extraction, but the model gets updated at some point, then the output possibilities expand very quickly. We might also allow human intervention/correction, which should take priority and resist overwrites. Assuming we're caching these data then they'll also land somewhere in the database and unless provenance is first class, they'll appear just as ground truth as any other.
Bitemporal databases look interesting but the amount of scaffolding above sqlite makes the data harder to manage.
So if I keep ground truth data as text, looks like I'm going to have an import pipeline into timelinize, and basically ensure that there's a stable pkey (almost certainly timestamp + qualifier), and always overwrite. Seems feasible, pretty exciting!
infogulch 5 days ago
Have you heard of XTDB / Bitemporality? The basic idea is to make time 2-dimensional, where each record has both a System Time range and a Valid Time range. Designed as a write-only db with full auditability for compliance purposes.
With 2D time you can ask complex questions about what you knew when, with simpler questions automatically extended into a question about the current time. Like:
"What is the price?" -> "What is the price today, as of today?" "What was the price in 2022" -> "What was the price in 2022, as of today?" "What was the price in 2022, as of 2023?"
You probably don't want to just switch to XTDB, but if you pursue this idea I think you should look into 2D time as I think it is schematically the correct conceptualization for this problem.
Docs: https://docs.xtdb.com/concepts/key-concepts.html#temporal-co... | 2025 Blog: https://xtdb.com/blog/diy-bitemporality-challenge | Visualization tool: https://docs.xtdb.com/concepts/key-concepts.html#temporal-co...
- mholt 5 days ago
  
  Yeah, I did actually pursue this for a time (heh), but I might revisit it later. It was too much complexity for debateable value-add, though the value is growing on me.
  
  1 reply →