There's always room for improvement, for example, Samply [0] is a wonderful profiler that uses the same APIs that `perf` uses, but unwinds the stacks as they come rather than dumping them all to disk and then having to process them in bulk.
Samply unwinds significantly faster than `perf` because it caches unwind information.
That being said, this approach still has some limitations, such as that very deep stacks won't be unwound, as the size of the process stack the kernel sends is quite limited.
TBH this sounds more like perf's implementation is bad.
I'm waiting for this to happen: https://github.com/open-telemetry/community/issues/1918
There's always room for improvement, for example, Samply [0] is a wonderful profiler that uses the same APIs that `perf` uses, but unwinds the stacks as they come rather than dumping them all to disk and then having to process them in bulk.
Samply unwinds significantly faster than `perf` because it caches unwind information.
That being said, this approach still has some limitations, such as that very deep stacks won't be unwound, as the size of the process stack the kernel sends is quite limited.
- [0]: https://github.com/mstange/samply