Comment by loeg

1 year ago

Brendan mentions DWARF unwinding, actually, and briefly mentions why he considers it insufficient.

9 comments

loeg

The biggest objection seems to be the Java/JIT case. eh_frame supports a "personality function" which is AIUI basically a callback for performing custom unwinding. If the personality function could also support custom logic for producing backtraces, then the profiling sampler could effectively read the JVM's own metadata about the JIT'ted code, which I assume it must have in order to produce backtraces for the JVM itself.

loeg 1 year ago
This also seems like a big objection:
> The overhead to walk DWARF is also too high, as it was designed for non-realtime use.
- kouteiheika 1 year ago
  
  Not a problem in practice. The way you solve it is to just translate DWARF into a simpler representation that doesn't require you to walk anything. (But I understand why people don't want to do it. DWARF is insanely complex and annoying to deal with.)
  Source: I wrote multiple profilers.
  
  4 replies →
- menaerus 1 year ago
  
  From https://fzn.fr/projects/frdwarf/frdwarf-oopsla19.pdf
  DWARF-based unwinding can be a bottleneck for time-sensitive program analysis tools. For instance the perf profiler is forced to copy the whole stack on taking each sample and to build the backtraces offline: this solution has a memory and time overhead but also serious confidentiality and security flaws.
  So if I get this correctly, the problem with DWARF is that building the backtrace online (on each sample) in comparison to frame pointers is an expensive operation which, however, can be mitigated by building the backtrace offline at the expense of copying the stack.
  However, paper also mentions
  Similarly, the Linux kernel by default relies on a frame pointer to provide reliable backtraces. This incurs in a space and time overhead; for instance it has been reported (https://lwn.net/Articles/727553/) that the kernel’s .text size increases by about 3.2%, resulting in a broad kernel-wide slowdown.
  and
  Measurements have shown a slowdown of 5-10% for some workloads (https://lore.kernel.org/lkml/20170602104048.jkkzssljsompjdwy@suse.de/T/#u).
- haberman 1 year ago
  
  But that one has at least some potential mitigation. Per his analysis, the Java/JIT case is the only one that has no mitigation:
  > Javier Honduvilla Coto (Polar Signals) did some interesting work using an eBPF walker to reduce the overhead, but...Java.