Comment by haberman
10 months ago
The biggest objection seems to be the Java/JIT case. eh_frame supports a "personality function" which is AIUI basically a callback for performing custom unwinding. If the personality function could also support custom logic for producing backtraces, then the profiling sampler could effectively read the JVM's own metadata about the JIT'ted code, which I assume it must have in order to produce backtraces for the JVM itself.
This also seems like a big objection:
> The overhead to walk DWARF is also too high, as it was designed for non-realtime use.
Not a problem in practice. The way you solve it is to just translate DWARF into a simpler representation that doesn't require you to walk anything. (But I understand why people don't want to do it. DWARF is insanely complex and annoying to deal with.)
Source: I wrote multiple profilers.
For a busy 64-CPU production JVM, I tested Google's Java symbol logging agent that just logged timestamp, symbol, address, size. The c2 compiler was so busy, constantly, that the overhead of this was too high to be practical (beyond startup analysis). And all this was generating was a timestamp log to do symbol lookup. For DWARF to walk stacks there's a lot more steps, so while I could see it work for light workloads I doubt it's practical for the heavy production workloads I typically analyze. What do you think? Have you tested on a large production server where c2 is a measurable portion of CPU constantly, the code cache is >1Gbyte and under heavy load?
1 reply →
In this thread[1] we're discussing problems with using DWARF directly for unwinding, not possible translations of the metadata into other formats (like ORC or whatever).
[1]: https://news.ycombinator.com/item?id=39732010
1 reply →
From https://fzn.fr/projects/frdwarf/frdwarf-oopsla19.pdf
So if I get this correctly, the problem with DWARF is that building the backtrace online (on each sample) in comparison to frame pointers is an expensive operation which, however, can be mitigated by building the backtrace offline at the expense of copying the stack.
However, paper also mentions
and
But that one has at least some potential mitigation. Per his analysis, the Java/JIT case is the only one that has no mitigation:
> Javier Honduvilla Coto (Polar Signals) did some interesting work using an eBPF walker to reduce the overhead, but...Java.