Comment by wise0wl
6 hours ago
The OpenTelemetry spec is absolutely what folks have been waiting for for as long as I've been in computing (~20 years). A single standard that is implemented in nearly every popular language with very close feature parity. It's honestly wonderful to work with compared to the old vendor supplied frameworks.
I took it upon myself to write a library for my current employer (4yrs ago now?) that abstracted and standardized the way our Rust services instantiated and utilized the metrics and tracing fundamentals that OpenTelemetry provides. I recently added OTLP logging (technically using tracing events) to allow for forwarding baggage / context / metadata with the log lines. The `tracing` crate in rust also has a macro called `instrument` that allows you to mostly auto-instrument your functions for tracing, allowing the tracing context to be extracted and propagated into your function so the trace / span can be added to subsequent HTTP / gRPC requests.
We did all kinds of other stuff too, like adding a method for attaching the trace-id to our kafka messages so we can see how long the entire lifetime of the request takes (including sitting on the queue). It's been extremely insightful.
Signoz is newer to the game. I'm glad there are more competitors and vendors using OpenTelemetry natively. We originally talked to some of the big vendors and they were going to gladly accept OpenTelemetry, but they marked every metric as a "custom" metric and would charge out the wazoo for each of them, far in excess of whatever was instrumented natively with their APM plugin thingamabob.
The more the better. I love OpenTelemetry, and using it in Rust has been mostly great.
That library you built sounds great. The kind of things that I love to read the code of, if I'm using it in a project. I was divided between adding instrument macro, but decided on manual instrumentation for the demonstration.
Regarding monitoring Kakfa execution times, absolutely agreed. In my previous job, monitoring Celery had helped us understand consumer bottlenecks, because we couldn't see background job traces containing the celery consumer spans. And when they did appear, they were hours late. So the entire trace took 8 hours instead of the expected couple minutes.
Happy to hear you've been enjoying OTel and Rust!