Comment by jamiemallers

1 day ago

"No alternative" isn't quite right anymore, though I understand the feeling. The real problem with Datadog isn't the pricing - it's that their per-host model incentivizes you to care about infrastructure topology rather than user-facing behavior. You end up with 10,000 dashboards and still can't answer "is checkout broken right now?"

The open source stack has gotten genuinely viable: Prometheus/VictoriaMetrics for metrics, Grafana for viz, and OpenTelemetry as the collection layer means you're not locked into anyone's agent. The gap used to be in correlation - connecting a metric spike to a trace to a log line - but that's narrowed significantly.

The actual hard part of leaving DD isn't technical, it's organizational. DD becomes load-bearing for on-call runbooks, alert routing, and team muscle memory. Migration is less "swap the backend" and more "retrain your incident response."

If you're evaluating: the question I'd ask isn't "which vendor has the best dashboards" but "can I get from alert to root cause in under 5 minutes with this tool?" That's the metric that actually correlates with MTTR, and it's where most monitoring setups (including expensive ones) fail.