← Back to context

Comment by binarylogic

7 hours ago

I spent a decade in observability. Built Vector, spent three years at Datadog. This is what I think is broken with observability and why.

And how are you solving the problem? The article does not say.

> I'm answering the question your observability vendor won't

There was no question answered here at all. It's basically a teaser designed to attract attention and stir debate. Respectfully, it's marketing, not problem solving. At least, not yet.

  • The question is answered in the post: ~40% on average, sometimes higher. That's a real number from real customer data.

    But I'm an engineer at heart. I wanted this post to shed light on a real problem I've seen over a decade in this space that is causing a lot of pain; not write a product walkthrough. But the solution is very much real. There's deep, hard engineering going on: building semantic understanding of telemetry, classifying waste into verifiable categories, processing it at the edge. It's not simple, and I hope that comes through in the docs.

    The docs get concrete if you want to peruse: https://docs.usetero.com/introduction/how-tero-works

    • I would contend that it is impossible to know a priori what is wasted telemetry and what isn’t, especially over long time horizons. And especially if you treat your logs as the foundational source of truth for answering critical business questions as well as operational ones.

      And besides, the value isn’t knowing that the waste rate is 40% (and your methodology isn’t sufficiently disclosed for anyone to evaluate its accuracy). The value in knowing what is or will be wasted. It’s reminiscent of that old marketing complaint: “I know that half my advertising budget is wasted; I just don’t know which half.”

      Storage is actually dirt cheap. The real problem, in my view, is not that customers are wasting storage, but that storage is being used inefficiently, that the storage formats aren’t always mechanically sympathetic and cloud-spend-efficient to the ways they data is read and analyzed, and that there’s still this culturally grounded disparate (and artificial) treatment of application and infrastructure logs vs business records.

I'm curious about the deep details, but the link 404s.

  • My apologies, I fixed the link. So much for restructuring the docs the night before posting this.

    You can read more here: https://docs.usetero.com/data-quality/overview

    To loosely describe our approach: it's intentionally transparent. We start with obvious categories (health checks, debug logs, redundant attributes) that you can inspect and verify. No black box.

    But underneath, Tero builds a semantic understanding of your data. Each category represents a progression in reasoning, from "this is obviously waste" to "this doesn't help anyone debug anything." You start simple, verify everything, and go deeper at your own pace.