Comment by brancz

10 months ago

The way perf does it is slow, as the entire stack is copied into user-space and is then asynchronously unwound.

This is solvable as Brendan calls out, we’ve created an eBPF-based profiler at Polar Signals, that essentially does what you said, it optimized the unwind tables, caches them in bpf maps, and then synchronously unwinds as opposed to copying the whole stack into user-space.

It should also be said that you need some sort of DWARF-like information to understand inlining. If I have a function A that inlines B that in turn inlines C, I'd often like to understand that C takes a bunch of time, and with frame pointers only, that information gets lost.

  • Inlined functions can be symbolized using DWARF line information[0] while unwinding requires DWARF unwind information (CFI), which the x86_64 ABI mandates in every single ELF in the `.eh_frame` section

    - [0] This line information might or might not be present in an executable but luckily there's debuginfod (https://sourceware.org/elfutils/Debuginfod.html)

This conveniently sidesteps the whole issue of getting DWARF data in the first place, which is also still a broken disjointed mess on Linux. Hell, Windows solved this many many years ago.

  • You'd need a pretty special distro to have enabled -fno-asynchronous-unwind-tables by default in its toolchain.

    By default on most Linux distros the frame tables are built into all the binaries, and end up in the GNU_EH_FRAME segment, which is always available in any running process. Doesn't sound a broken and disjointed mess to me. Sounds more like a smoothly running solved problem.