← Back to context

Comment by brenns10

10 months ago

Using DWARF is annoying with perf because the kernel doesn't support stack unwinding with DWARF, and never will. The kernel has to be the one which unwinds the user space stacks, because it is the one managing the perf counters and handling the interrupts.

Since it can't unwind the user space stacks, the kernel has to copy the entire stack (8192 bytes by default) into the output file (perf.data) and then the perf user space program will unwind it later. It does this for each sample, which is usually hundreds of times per second, per CPU. Though it depends how you configured the collection.

That does have a significant overhead: first, runtime overhead: copying 8k bytes, hundreds of times per second, and writing it to disk, all don't come for free. You spend quite a bit of CPU time doing the memcpy operation which consumes memory bandwidth too. You also frequently need to increase the size of the perf memory buffer to accommodate all this data while it waits for user space to write it to disk. Second, disk space overhead, since the 8k stack bytes per sample are far larger than the stack trace would be. And third, it does require that you install debuginfo packages to get the DWARF info, which is usually a pain to do on production machines, and they consume a lot of disk space on their own.

Many of these overheads aren't too bad in simple cases (lower sample rates, fewer CPUs, or targeting a single task). But with larger machines with hundreds of CPUs, full system collections, and higher frequencies, the overhead can increase exponentially.

I'm not certain I know what you mean by the "slow unwinding path for perf", as there is no faster path for user space when frame pointers are disabled (except Intel LBR as outlined in the blog).