← Back to context

Comment by oulipo2

3 days ago

could you give a brief overview of: - what libs were you using - what kind of algos / models were most useful for what kind of data?

I have an IoT use-case, I wanted to look both at NNs and more classical stats models to see if it has value

Can't for obvious reasons. But no specialized libraries used. The usual Python stack that comes packaged for any respectable OS distribution these days, mixed in with other close-to-the-metal languages for performance or API compatibility reasons.

Look up nonparametric statistical process control and you will find useful papers. The algorithms are actually quite simple to implement. If the algorithms are not simple then probably they are not worth your time. The analysis in the paper might be complicated, don't worry about that, look for simplicity of the algorithms.

did similar work at similar scale to srean.

Assume you have signal from one IoT device, say a sensor reading. Anomalies are sudden changes in the value of the signal. Define sudden (using the time delta between observations and your other domain knowledge); let's say the sensor reports 1x/second and sudden means 1-3 minutes.

Simple options: rolling mean last 3 values/rolling mean last 60 values. If this value is over a threshold, alert

Say the readings are normally distributed, or they can be detrended/made normal via a simple 1 or 2 stage AR/MA model. Apply the https://en.wikipedia.org/wiki/Western_Electric_rules to detect anomalies.

Complexer but still simple options. Say you have IoT sensors over a larger area, and an anomaly is one sensor which is higher than others. Run roughly the same analysis as above, but on the correlation matrix of all the sensors. Look for rapidly changing correlations.

example: temperature detectors in each room of your house, and your kid opens the front door to go play in the snow. The entry hall cools down while the rest of the house's temp stays roughly stable. You can picture what that does to the correlation matrix.

  • Bang on.

    It was little more complicated to remove temporal dependencies from the original streams and we could not rely on Gaussian behaviour. Other than that, it's pretty much the same, barring an effort to keep the alerting thresholds unaffected by recent anomalies.