Comment by rwmj

10 months ago

Yes, although the universal mechanisms that have been proposed so far have been quite ridiculous - for example having every program handle a "frame pointer signal" in userspace, which doesn't account for the reality that we need to do frame unwinding thousands of times a second with the least possible overhead. Frame pointers work for most things, and where they don't work (interpreted code) you're often not that interested in performance.

> every program handle a "frame pointer signal" in userspace

Yep. That's my proposal.

> which doesn't account for the reality that we need to do frame unwinding thousands of times a second with the least possible overhead

Yes, it does. The kernel has to return to userspace anyway at some point, and pushing a signal frame during that return is cheap. The cost of signal delivery is the entry into the kernel, and after a perf counter overflow, you've already paid that cost. Why would the actual unwinding be any faster in the kernel than in userspace?

Also, so what if a thread enters the kernel and samples the stack multiple times before returning to userspace? While in the kernel, the userspace stack cannot change --- therefore, it's sufficient to delay userspace stack collection until the kernel returns to userspace anyway.

You might ask "Don't we have to restore the signal mask after handling the profiling signal?"

Not if you don't define the signal to change the signal mask. sigreturn(2) is optional.

  • This sounds vastly more complex already than following a linked list. You've also ignored the other cost which is getting the stack trace data out of the program. Anyway I'm keen to see your implementation and test how it works in reality.