← Back to context

Comment by whazor

5 hours ago

We store the data because we might need to know it. We only discover we didn’t need to know it once we’ve finished knowing it.

Agree to an extent. There are absolutely unknown unknowns. But I think you'd be surprised how much data is obviously waste. Not the grey area, just pure garbage: health checks, debug logs left in production, redundant attributes.

That's why we break waste down into categories: https://docs.usetero.com/data-quality/categories/overview

But we don't stop there. You can go deeper with reasoning to root out the more nuanced waste. It's hard, but it's possible. That's where things get interesting.