← Back to context

Comment by menaerus

10 months ago

I believe it's a misrepresentation to say that "actual measured overhead is under 1%". I don't think such a claim can be universally applied because this depends on the very workload you're measuring the overhead with.

FWIW your results don't quite match the measurements from Linux kernel folks who claim that the overhead is anywhere between 5-10%. Source: https://lore.kernel.org/lkml/20170602104048.jkkzssljsompjdwy...

   I didn't preserve the data involved but in a variety of workloads including netperf, page allocator microbenchmark, pgbench and sqlite, enabling framepointer introduced overhead of around the 5-10% mark.

Significance in their results IMO is in the fact that they measured the impact by using PostgreSQL and SQLite. If anything, DBMS are one of the best ways to really stress out the system.

Those are numbers from 7 years ago, so they're beginning to get a bit stale as people start to put more weight behind having frame pointers and make upstream contributions to their compilers to improve their output. People put it at <1% from much more recent testing by the very R.W.M. Jones you're replying to [0] and separate testing by others like Brendan Gregg [1b], whose post this is commenting on (and included [1b] in the Appendix as well), with similar accounts by others in the last couple years. Oh, and if you use flamegraph, you might want to check the repo for a familiar name.

Some programs, like Python, have reported worse, 2-7% [2], but there is traction on tackling that [1a] (see both rwmj's and brendangregg's replies to sibling comments, they've both done a lot of upstreamed work wrt. frame pointers, performance, and profiling).

As has been frequently pointed out, the benefits from improved profiling cannot be understated, even a 10% cost to having frame pointers can be well worth it when you leverage that information to target the actual bottlenecks that are eating up your cycles. Plus, you can always disable it in specific hotspots later when needed, which is much easier than the reverse.

Something, something, premature optimisation -- though in seriousness, this information benefits actual optimisation, exactly because we don't have the information and understanding that would allow truly universal claims, precisely because things like this haven't been available, and so haven't been widely used. We know frame pointers, from additional register pressure and extended function prologue/epilogue, can be a detriment in certain hotspots; that's why we have granular control. But without them, we often don't know which hotspots are actually affected, so I'm sure even the databases would benefit... though the "my database is the fastest database" problem has always been the result of endless micro-benchmarking, rather than actual end-to-end program performance and latency, so even a claimed "10%" drop there probably doesn't impact actual real-world usage, but that's a reason why some of the most interesting profiling work lately has been from ideas like causal profilers and continuous profilers, which answer exactly that.

[0]: https://rwmj.wordpress.com/2023/02/14/frame-pointers-vs-dwar... [1a]: https://pagure.io/fesco/issue/2817#comment-826636 [1b]: https://pagure.io/fesco/issue/2817#comment-826805 [2]: https://discuss.python.org/t/the-performance-of-python-with-...

  • While improved profiling is useful, achieving it by wasting a register is annoying, because it is just a very dumb solution.

    The choice made by Intel when they have designed 8086 to use 2 separate registers for the stack pointer and for the frame pointer was a big mistake.

    It is very easy to use a single register as both the stack pointer and the frame pointer, as it is standard for instance in IBM POWER.

    Unfortunately in the Intel/AMD CPUs using a single register is difficult, because the simplest implementation is unreliable since interrupts may occur between 2 instructions that must form an atomic sequence (and they may clobber the stack before new space is allocated after writing the old frame pointer value in the stack).

    It would have been very easy to correct this in new CPUs by detecting that instruction sequence and blocking the interrupts between them.

    Intel had already done this once early in the history of the x86 CPUs, when they have discovered a mistake in the design of the ISA, that interrupts could occur between updating the stack segment and the stack pointer. Then they had corrected this by detecting such an instruction sequence and blocking the interrupts at the boundary between those instructions.

    The same could have been done now, to enable the use of the stack pointer as also the frame pointer. (This would be done by always saving the stack pointer in the top of the stack whenever stack space is allocated, so that the stack pointer always points to the previous frame pointer, i.e. to the start of the linked list containing all stack frames.)

  • I'd prefer discussing technical merits of given approach rather than who is who and who did what since that leads to appeal to authority fallacy.

    You're correct, results might be stale, although I wouldn't hold my breath for it since there has been no fundamental change in a way how frame pointers are handled as far as my understanding goes. Perhaps smaller improvements in compiler technology but CPUs did not undergo any significant change w.r.t. that context.

    That said, nowhere in this thread have we seen a dispute of those Linux kernel results other than categorically rejecting them as being "microbenchmarks", which they are not.

    > though the "my database is the fastest database" problem has always been the result of endless micro-benchmarking, rather than actual end-to-end program performance and latency

    Quite the opposite. All database benchmarks are end-to-end program performance and latency analysis. "Cheating" in database benchmarks is done elsewhere.

  • > As has been frequently pointed out, the benefits from improved profiling cannot be understated, even a 10% cost to having frame pointers can be well worth it when you leverage that information to target the actual bottlenecks that are eating up your cycles.

    Few can leverage that information because the open source software you are talking about lacks telemetry in the self hosted case.

    The profiling issue really comes down to the cultural opposition in these communities to collecting telemetry and opening it for anyone to see and use. The average user struggles to ally with a trustworthy actor who will share the information like profiling freely and anonymize it at a per-user level, the level that is actually useful. Such things exist, like the Linux hardware site, but only because they have not attracted the attention of agitators.

    Basically users are okay with profiling, so long as it is quietly done by Amazon or Microsoft or Google, and not by the guy actually writing the code and giving it out for everyone to use for free. It’s one of the most moronic cultural trends, and blame can be put squarely on product growth grifters who equivocate telemetry with privacy violations; open source maintainers, who have enough responsibilities as is, besides educating their users; and Apple, who have made their essentially vaporous claims about privacy a central part of their brand.

    Of course people know the answer to your question. Why doesn’t Google publish every profile of every piece of open source software? What exactly is sensitive about their workloads? Meta publishes a whole library about every single one of its customers, for anyone to freely read. I don’t buy into the holiness of the backend developer’s “cleverness” or whatever is deemed sensitive, and it’s so hypocritical.

    • > Basically users are okay with profiling, so long as it is quietly done by Amazon or Microsoft or Google, and not by the guy actually writing the code and giving it out for everyone to use for free.

      No; the groups are approximately "cares whether software respects the user, including privacy", or "doesn't know or doesn't care". I seriously doubt that any meaningful number of people are okay with companies invading their privacy but not smaller projects.

    • "Agitators". We don't trust telemetry precisely because of comments like that. World is full of people like you who apparently see absolutely nothing wrong with exfiltrating identifying information from other people's computers. We have to actively resist such attempts, they are constant, never ending and it only seems to get worse over time but you dismiss it all as "cultural opposition" to telemetry.

      For the record I'm NOT OK with being profiled, measured or otherwise studied in any way without my explicit consent. That even extends to the unethical human experiments that corporations run on people and which they euphemistically call A/B tests. I don't care if it's Google or a hobbyist developer, I will block it if I can and I will not lose a second of sleep over it.

      2 replies →

    • I think the kind of profiling information you're imagining is a little different from what I am.

      Continuous profiling of your system that gets relayed to someone else by telemetry is very different from continuous profiling of your own system, handled only by yourself (or, generalising, your community/group/company). You seem to be imagining we're operating more in the former, whereas I am imagining more in the latter.

      When it's our own system, better instrumented for our own uses, and we're the only ones getting the information, then there's nothing to worry about, and we can get much more meaningful and informative profiling done when more information about the system is available. I don't even need telemetry. When it's "someone else's" system, in other words, when we have no say in telemetry (or have to exercise a right to opt-out, rather than a more self-executing contract around opt-in), then we start to have exactly the kinds of issues you're envisaging.

      When it's not completely out of our hands, then we need to recognise different users, different demands, different contexts. Catering to the user matters, and when it comes to sensitive information, well, people have different priorities and threat models.

      If I'm opening a calendar on my phone, I don't expect it to be heavily instrumented and relaying all of that, I just want to see my calendar. When I open a calendar on my phone, and it is unreasonably slow, then I might want to submit relevant telemetry back in some capacity. Meanwhile, if I'm running the calendar server, I'm absolutely wanting to have all my instrumentation available and recording every morsel I reasonably can about that server, otherwise improving it or fixing it becomes much harder.

      From the other side, if I'm running the server, I may want telemetry from users, but if it's not essential, then I can "make do" with only the occasional opt-in telemetry. I also have other means of profiling real usage, not just scooping it all up from unknowing users (or begrudging users). Those often have some other "cost", but in turn, they don't have the "cost" of demanding it from users. For people to freely choose requires acknowledging the asymmetries present, and that means we can't just take the path of least resistance, as we may have to pay for it later.

      In short, it's a consent issue. Many violate that, knowingly, because they care not for the consequences. Many others don't even seem to think about it, and just go ahead regardless. And it's so much easier behind closed doors. Open source in comparison, even if not everything is public, must contend with the fact that the actions and consequences are (PRs, telemetry traffic, etc), so it inhabits a space in which violating consent is much more easily held accountable (though no guarantee).

      Of course, this does not mean it's always done properly in open source. It's often an uphill battle to get telemetry that's off-by-default, where users explicitly consent via opt-in, as people see how that could easily be undermined, or later invalidated. Many opt-in mechanisms (e.g. a toggle in the settings menu) often do not have expiration built in, so fail to check at a later point that someone still consents. Not to say that's the way you must do it, just giving an example of a way that people seem to be more in favour of, as with the generally favourable response to such features making their way into "permissions" on mobile.

      We can see how the suspicion creeps in, informed by experience... but that's also known by another word: vigilance.

      So, users are not "okay" with it. There's a power imbalance where these companies are afforded the impunity because many are left to conclude they have no choice but to let them get away with it. That hasn't formed in a vacuum, and it's not so simple that we just pull back the curtain and reveal the wizard for what he is. Most seem to already know.

      It's proven extremely difficult to push alternatives. One reason is that information is frequently not ready-to-hand for more typical users, but another is that said alternatives may not actually fulfil the needs of some users: notably, accessibility remains hugely inconsistent in open source, and is usually not funded on par with, say, projects that affect "backend" performance.

      The result? Many people just give their grandma an iPhone. That's what's telling about the state of open source, and of the actual cultural trends that made it this way. The threat model is fraudsters and scammers, not nation-state actors or corporate malfeasance. This app has tons of profiling and privacy issues? So what? At least grandma can use it, and we can stay in contact, dealing with the very real cultural trends towards isolation. On a certain level, it's just pragmatic. They'd choose differently if they could, but they don't feel like they can, and they've got bigger worries.

      Unless we do different, the status quo will remain. If there's any agitation to be had, it's in getting more people to care about improving things and then actually doing them, even if it's just taking small steps. There won't be a perfect solution that appears out of nowhere tomorrow, but we only have a low bar to clear. Besides, we've all thought "I could do better than that", so why not? Why not just aim for better?

      Who knows, we might actually achieve it.

      3 replies →

Those are microbenchmarks.

  • pgbench is not a microbenchmark.

    • From the docs: "pgbench is a simple program for running benchmark tests on PostgreSQL. It runs the same sequence of SQL commands over and over"

      While it might call itself a benchmark, it behaves very microbenchmark-y.

      The other numbers I and others have shared have been from actual production workloads. Not a simple program that tests same sequence of commands over and over.

      12 replies →