← Back to context

Comment by tdullien

10 months ago

As much as the return of frame pointers is a good thing, it's largely unnecessary -- it arrives at a point where multiple eBPF-based profilers are available that do fine using .eh_frame and also manually unwinding high level language runtime stacks: Both Parca from PolarSignals as well the artist formerly known as Prodfiler (now Elastic Universal Profiling) do fine.

So this is a solution for a problem, and it arrives just at the moment that people have solved the problem more generically ;)

(Prodfiler coauthor here, we had solved all of this by the time we launched in Summer 2021)

First of all, I think the .eh_frame unwinding y'all pioneered is great.

But I think you're only thinking about CPU profiling at <= 100 Hz / core. However, Brendan's article is also talking about Off-CPU profiling, and as far as I can tell, all known techniques (scheduler tracing, wall clock sampling) require stack unwinding to occur 1-3 orders of magnitude more often than for CPU profiling.

For those use cases, I don't think .eh_frame unwinding will be good enough, at least not for continuous profiling. E.g. see [1][2] for an example of how frame pointer unwinding allowed the Go runtime to lower execution tracing overhead from 10-20% to 1-2%, even so it was already using a relatively fast lookup table approach.

[1] https://go.dev/blog/execution-traces-2024

[2] https://blog.felixge.de/reducing-gos-execution-tracer-overhe...

I'm under the impression that eh_frame stack traces are much slower than frame pointer stack traces, which makes always-on profiling, such as seen in tcmalloc, impractical.

PolarSignals is specifically discussed in the linked threads, and they conclude that their approach is not good enough for perf reasons.

Also I've heard that the whole .eh_frame unwinding is more fragile than a simple frame pointer. I've seen enough broken stack traces myself, but honestly I never tried if -fno-omit-frame-pointer would have helped.

  • Yes and no. A simple frame pointer needs to be present in all libraries, and depending on build settings, this might not be the case. .eh_frame tends to be emitted almost everywhere...

    So it's both similarly fragile, but one is almost never disabled.

    The broader point is: For HLL runtimes you need to be able to switch between native and interpreted unwinds anyhow, so you'll always do some amount of lifting in eBPF land.

    And yes, having frame pointers removes a lot of complexity, so it's net a very good thing. It's just that the situation wasnt nearly as dire as described, because people that care about profiling had built solutions.

    • Forget eBPF even -- why do the job of userspace in the kernel? Instead of unwinding via eBPF, we should ask userspace to unwind itself using a synchronous signal delivered to userspace whenever we've requested a stack sample.

      2 replies →

You mean we don‘t need accessible profiling in free software because there are companies selling it to us. Cool.

  • Parca is open-source, Prodfiler's eBPF code is GPL, and the rest of Prodfiler is currently going through OTel donation, so my point is: There's now multiple FOSS implementations of a more generic and powerful technique.

If you're sufficiently in control of your deployment details to ensure that BPF is available at all. CAP_SYS_PTRACE is available ~everywhere for everyone.