Comment by claytonwramsey
10 months ago
That's very interesting to me - I had seen the `[unknown]` mountain in my profiles but never knew why. I think it's a tough thing to justify: 2% performance is actually a pretty big difference.
It would be really nice to have fine-grained control over frame pointer inclusion: provided fine-grained profiling, we could determine whether we needed the frame pointers for a given function or compilation unit. I wouldn't be surprised if we see that only a handful of operations are dramatically slowed by frame pointer inclusion while the rest don't really care.
> 2% performance is actually a pretty big difference.
No it's not, particularly when it can help you identify hotspots via profiling that can net you improvements of 10% or more.
Sure, but how many of the people running distro compiled code do perf analysis? And how many of the people who need to do perf analysis are unable to use a with-frame-pointers version when they need to? And how many of those 10% perf improvements are in common distro code that get upstreamed to improve general user experience, as opposed to being in private application code?
If you're netflix then "enable frame pointers" is a no-brainer. But if you're a distro who's building code for millions of users, many of whom will likely never need to fire up a profiler, I think the question is at least a little trickier. The overall best tradeoff might end up being still to enable frame pointers, but I can see the other side too.
It's not a technical tradeoff, it's a refusal to compromise. Lack of frame pointers prevents many groups from using software built by distros altogether. If a distro decides that they'd rather make things go 1% faster for grandma, at the cost of alienating thousands of engineers at places like Netflix and Google who simply want to volunteer millions of dollars of their employers resources helping distros to find 10x performance improvements, then the distros are doing a great disservice to both grandma and themselves.
4 replies →
I would say the question here is what should be the default, and that the answer is clearly "frame pointers", from my point of view.
Code eking out every possible cycle of performance can enable a no-frame-pointer optimization and see if it helps. But it's a bad default for libc, and for the kernel.
You can turn it on/off per function by attaching one of these GCC attribute to the function declaration (although it doesn't work on LLVM):
The optimize fn attr causes other unintended side effects. Its usage is banned on the Linux kernel.
The performance cost in your case may be much smaller than 2 per cent.
Don't completely trust the benchmarks on this; they are a bit synthetic and real-world applications tend to produce very different results.
Plus, profiling is important. I was able to speed up various segments of my code by up to 20 per cent by profiling them carefully.
And, at the end of the day, if your application is so sensitive about any loss of performance, you can simply profile your code in your lab using frame pointers, then omit them in the version released to your customers.
> And, at the end of the day, if your application is so sensitive about any loss of performance, you can simply profile your code in your lab using frame pointers, then omit them in the version released to your customers.
That is what should be done but TFA is about distros shipping code with frame pointers to end uses because some developers are too lazy to recompile libc when profiling. Somehow shipping different copies of libc, one indended for end users on low-powered devices and one indended for developers is not even considered.
If you can't introspect the release version of your software, you have no way of determining what the issue is. You're doing psuedo-science and guesswork to try and replicate the issue on a development version of the software. And if you put in a few new logging statements into the release version, there's a pretty good chance that simply restarting the software will cause the symptom to go away.
The measured overhead is slightly less than 1%. There have been some rare historical cases where frame pointers have caused performance to blow up but those are fixed.
It’s usually a lot less than 2%.