Comment by whazor

1 month ago

We store the data because we might need to know it. We only discover we didn’t need to know it once we’ve finished knowing it.

1 comment

whazor

binarylogic 1 month ago

Agree to an extent. There are absolutely unknown unknowns. But I think you'd be surprised how much data is obviously waste. Not the grey area, just pure garbage: health checks, debug logs left in production, redundant attributes.

That's why we break waste down into categories: https://docs.usetero.com/data-quality/categories/overview

But we don't stop there. You can go deeper with reasoning to root out the more nuanced waste. It's hard, but it's possible. That's where things get interesting.