Comment by hmpc

15 hours ago

You might've misunderstood the requirements. The time scale was 1-10 micros per component; 100 ns was the overhead per span we were aiming for.

In this case distributed tracing absolutely was the right choice. These were not simple computational tasks. The components were highly stateful and interconnected both on- and cross-host. Between this and the timescale, as well as the volume of events and the dollar-value impact of each potential failure (of which there were many), we needed real-time analysis capabilities, not a profiler.

1 comment

hmpc

jeffbee 15 hours ago

I guess my skepticism about the application colored my reading of the rest of it. If it had only said you needed it to be faster, that would have been easier for a simpleton like me.