Comment by throwaway2037
2 years ago
Wild post. No trolling: What is the cross section of CRDT and info sec? Usually, "CRDT" is like catnip on HN. And, yes, I am a major fanboi/fangurl of CRDT. The first time I ever watched a networed text editor with "simultaneously" blinking cursors where two humans were editing in parallel was mind blowing. It opens your mind to many other collaborative editing ideas.
I work in the SIEM space, which basically involves ingesting massive amounts of data (relatively speaking). A single customer can ingest terabytes a day, or even 10s to 100s of terabytes of data a day. And you want to run near-arbitrary realtime analytics on it + batch analytics on it. It's a fun, difficult problem.
My product's big thing was to extract the data from logs and into a graph data structure. The thing is that I've just taken "huge amount of scale + nice, immutable log" and turned it into "huge amount of scale + evil, mutable graph". Building a massive-scale graph datastructure that can be mutated over time is... hard. Like, "hope you've been keeping up on your academic papers" hard.
One of the key optimizations I leveraged was to represent the graph as a CRDT. Every Node has a `merge` function that follows CRDT semantics.
This allows me to collapse states together in a way that converges.
Security queries have some interesting properties:
1. They often care about thresholds, meaning that they inherently work well with a lattice (once a you've hit a "bad" state you will always want to investigate that state - this is unlike, say, operations where if it "recovers" you can ignore it)
2. They almost always filter out data
These two properties combine nicely. It means that if our alert is a threshold, and our data only 'grows' in one direction (thanks CRDTs), we can reject queries using stale data and not worry about invalidating any caches.