← Back to context

Comment by AlotOfReading

6 hours ago

    In general, the tradeoff is between optimisations that help large programs vs optimisations that help small programs.

Do you have concrete examples of large scale Java programs that are significantly more performant than comparable programs in native languages like C++? My understanding was that this dynamic hadn't fundamentally changed much since the 2010s, when Java was able to occasionally edge out a win in 1-2 benchmarks and would lose handily in others. My experience is that large scale Java programs remain a bit of a bear even after significant optimization effort (e.g. Bazel).

There are of course plenty of optimizations the JVM does that aren't possible AOT, but that that doesn't imply an automatic win at large scales, as Rust demonstrates.

> Do you have concrete examples of large scale Java programs that are significantly more performant than comparable programs in native languages like C++?

Yes. I was working in a place that made large sensor-fusion applications, air-traffic control applications, and logistical planning, each in the 2-8MLOC range. Over time, we ported all of them from C++ to Java because C++'s performance overheads were too annoying to work around.

Of course, in principle it's always possible to match and perhaps even exceed Java's performance in a low-level language, but in practice it becomes ever more difficult as the program grows (and the cost remains with maintenance forever). The reason is that as programs grow, patterns become less regular (e.g. the variance in object lifetimes grows), the need for concurrency grows (and so the need for sharing objects among threads and for lock free data structures), and more general constructs are used (e.g. more dynamic dispatch). Improvements in modern allocators, as well as LTO and PGO have helped, but not enough to match the extent of optimisations you can do once you're free of the design constraints of low-level control and the focus on the worst case.

Java's thesis (not initially, but from very early on) was to rely on optimisations that can't be effectively employed by low-level languages because of their constraints, such as efficient memory management that benefits from being able to move most pointers in a program, and highly aggressive speculative optimisations (that are nondeterministic and can fail, resulting in deoptimisation). These optimisations tend to be global, and so they don't restrict program structure much, keeping maintenance costs lower, but they do help the average case at the cost of harming the worst case, which is a tradeoff that programs written in low-level languages don't want, and of course, it doesn't give the low-level control that's the entire point of low-level languages. Proving that thesis took a while, and longer in some aspects than others (moving collectors that don't pause were first released to a wide audience three years ago).

Of course, the differences aren't huge because the hot paths are typically small enough that they can be improved without adding too much cost (and hot paths require some manual optimisation in all languages), but gaining some performance as a side effect of significantly lowering costs is nice.

> There are of course plenty of optimizations the JVM does that aren't possible AOT, but that that doesn't imply an automatic win at large scales, as Rust demonstrates.

I don't know what it is that Rust demonstrates given how few large scale projects have chosen it, but I've seen nothing to indicate that it doesn't suffer from the same performance issues as C++ compared to Java. In fact, someone I know who works at one of the world's largest tech companies told me that his team lead really wanted to do something in Rust, so they ported a small-to-medium service from Java to Rust. The result was such a huge performance drop that it wouldn't meet their minimum requirements. They were then forced to spend an additional 6 to 12 months carefully hand-optimising their Rust code until it matches Java's performance, but the result is such that all future maintenance will be more expensive. This is the exact same pattern I've seen with C++.

It's interesting that 20 years ago the people who said Java can't beat C++ on performance were experienced low-level programmers who had little or no experience with Java (and they were also right on several axes at the time). Today the people who say that are those with little experience with low-level languages (and are under the impression that low level languages are universally fast), but they will eventually learn about their fundamental performance issues just as we did decades ago.

I think that Rust in particular has made people without much experience in low-level programming (among which Rust has made much more inroads than among those with a lot of experience in low-level programming) believe a certain story, namely that the problem with low level languages was memory safety and that that was the reason so many large programs switched to Java despite the performance sacrifices they had to make. Now that Rust fixes that problem, they can have their cake and eat it too! In reality, memory safety was indeed one of the several significant problems with low level languages that Java sought to fix, but another was the performance issues low level languages suffer from as they get large (making good performance ever more costly). The tradeoff isn't performance (in large programs there might even be a performance gain) but low-level control, as that is what low-level languages are about. That was what they offered back then, and it's still what they offer now. Rust was first designed twenty years ago, back when things still looked a certain way (which is why, IMO, it repeated most of C++'s design mistakes), but these days I think that a better, more modern design of low-level languages is more focused on control, leaving large programs to high-level languages. Lack of memory safety has, without a doubt, been one of the things that made low-level languages less palatable to "ordinary" applications, but it was far from the only one.

Anyway, I'm sure the debate of which is faster, C++ (/Rust/Zig) or Java, will continue, and frankly, due to the nature of modern hardware, compiler, and runtime optimisations these days (when the question of the cost of some individual operation is all but meaningless and out ability to extrapolate from the performance of one program to another is close to nil), it largely comes down to empirical questions such as which program patterns are more or less common in the field and in which domains, as there are code and workload patterns that could give an advantage to either one.

  • ”they ported a small-to-medium service from Java to Rust. The result was such a huge performance drop that it wouldn't meet their minimum requirements”

    That result would say less about performance of languages than it would about competency of developers with a language.

    I just don’t buy that a task could be assigned to two teams with comparable expertise and domain knowledge in Rust and Java, and have the Rust result be at a “huge” performance deficit.

    No, don’t believe that was an apples to apples comparison.

    • It may well be the case that it's not an apples-to-apples comparison, but as someone with over two decades of experience in both Java and C++, I find it not only unsurprising, but as a case of both Java and Rust doing exactly what they're designed to do.

      Rust is designed to be a low-level language, i.e. a language with maximal control with all of its pros and cons (albeit with memory safety, which C++ doesn't have), while Java is designed to address the performance issues low level languages have, particularly as they get larger, due to their control constraints. Without such constraints, it is easier to offer better performance for less effort especially as programs grow.

      In that particular program I was told that the differences were due to needing more locks in the Rust version. As has always been the case, they managed to achieve parity with much more effort (that is expected to continue over the lifetime of the software), but again, this is the explicit tradeoff of the approaches.

      Thirty years ago, and even twenty years ago (when Rust was first being designed) many still believed that more control is the only path to good performance, even if it comes with a lot of effort. Today it's clear that it's not the only path, and the debate is mostly around which program and workload patterns that happen to work better with one approach or the other are more common.

  • I’ve done performance-engineering for decades in Java, C++, and C for both data analytics and supercomputing/HPC. Java performs significantly worse than C++ in all cases without exception. This is the result you should expect from first principles; something has gone horribly wrong with your software optimization if Java is faster than C++ or even Rust.

    There are good reasons to use Java in environments that care about performance. Absolute performance can be traded for other concerns while still being good. It is why I did so much performance-engineering work in the language.

    Most performance is architectural in nature. Extremely granular control of scheduling is a prerequisite. System languages provide that control if you want it, Java does not.

    When you design software in Java, you accept that some software architectures are not available to you. If you care about performance, you would not port a software architecture optimized around the limitations of Java to a systems language.

    • > I’ve done performance-engineering for decades in Java, C++, and C for both data analytics and supercomputing/HPC. Java performs significantly worse than C++ in all cases without exception.

      I've done similar work (not supercomputing/HPC, but yes for soft and hard realtime software, including safety-critical software) and I couldn't disagree more.

      > This is the result you should expect from first principles; something has gone horribly wrong with your software optimization if Java is faster than C++ or even Rust.

      Strong disagreement here, but we need to be specific about what we mean when we say performance.

      It is undoubtedly true that for every Java program there exists a C++ program with the same performance, and the proof is simple: every Java program is a C++ program with the classes being input. But that C++ program is close to 2MLOC long. The same could also be said about a C++ program vs. an Assembly program, as every C++ program could be written as an Assembly program.

      But when I talk about performance, I refer to what I think most programmers care about when it comes to performance. Not how fast can a program hypothetically be given enough effort and expertise, but how fast can my program be in my budget.

      Both speculative compiler optimisations and memory management optimisations are simply not an option for low level languages due to their constraints, and they are very powerful global optimisations. Given a lot of expertise and effort (that must continue throughout the software's lifetime, and often increases as it evolves) you can work around these limitations, but Java was designed so that you can benefit from them, which means more performance per unit of effort.

      In large programs more general constructs (e.g. dynamic dispatch) and patterns (concurrency, great variance in object lifetime) grow in prevalence, and low level languages require more effort and discipline to work around their shortcomings in these areas. Optimising JITs that allow aggressive speculative optimisations and moving collectors were invented and adopted to address these shortcomings. You could claim that the advanced mechanisms that were developed to address C++'s performance issues have failed to achieve their goal, although it won't be easy and much of it comes down to empirical questions of which patterns arise more or less frequently in software, but given that this is what these mechanisms were at least intended to achieve, you certainly can't claim that they fail to do so "from first principles". Some compilation optimisations need speculation; some memory management optimisations need moving pointers. Not having these optimisations available in a program you can write without a lot of special effort cannot make it faster "from first principles".

      So no, I don't believe at all that something has to go wrong for a Java program to be faster than a C++ program given a certain budget for the program. Indeed, in larger, more complex programs, I believe the very opposite is true. In most situations, if you get the same performance in C++ as you do in Java, then something has gone terribly wrong with your Java program.

      As someone who's worked on a pretty famous JVM feature (virtual threads), I can tell you that we and the designers of low-level languages consciously make different performance tradeoffs because we optimise for different programs and people, and have different preferences when it comes to average case vs. worst case, but there is no universal dominance in performance to either one of these approaches over the other.

      One obvious example was our decision to remove Unsafe from Java. Some people voiced opposition citing a program speed competition (the "one-billion-row challenge" [1]) where Unsafe improved the performance of an entry by 25%. But we saw it as further motivation for the decision. Among over a dozen performance experts who submitted entries, only one was able to write a program efficient enough for Unsafe to make a big difference, and the variance in the results even among the top 20 or so entries was larger than Unsafe's improvement. By removing Unsafe, we would harm that one expert's program, but it would allow us to perform more aggressive constant-folding optimisations that would result in much greater performance improvements over the entire ecosystem. Even from a design philosophy perspective alone, this removal of control to the detriment of some programs "for the greater good" of performance over the entire ecosystem is almost unthinkable in low level languages, because control is what they're for.

      [1]: https://github.com/gunnarmorling/1brc

  •     I don't know what it is that Rust demonstrates given that few large scale projects have chosen it, but I've seen nothing to indicate that it doesn't suffer from the same performance issues as C++ compared to Java. 
    

    The point of bringing up Rust is that it also gives the compiler much more information to optimize on than C++, but actual performance is comparable or slightly worse in most benchmarks because the quality of C++ codegen is so high. Some of those Rust advantages are exactly the same things that have been touted as major advantages for Java over C++, like escape analysis and lifetimes.

        Of course, in principle it's always possible to match and perhaps even exceed Java's performance in a low-level language, but in practice it becomes ever more difficult as the program grows (and the cost remains with maintenance forever).
    

    Sure, which is why I asked for real examples of whatever you consider a "large scale" program. I wasn't able to find anything via search before I replied, and the wiki page on Java performance [0] is repeating what I understood.

    [0] https://en.wikipedia.org/wiki/Java_performance

    • > Some of those Rust advantages are exactly the same things that have been touted as major advantages for Java over C++, like escape analysis and lifetimes.

      These aren't the biggest advantages. I would say that the biggest ones are aggressive speculative optimisations that allow inlining of virtual calls (by default, up to a depth of 15 calls) and the ability to freely move pointers, which allows alternatives to free-list-based memory management. Low-level languages can't afford pervasive speculative optimisation (as they're focused on the worst case) and can't allow most of their pointers to be moved (because they often share them directly with the hardware and/or device drivers).

      > and the wiki page on Java performance [0] is repeating what I understood.

      That may be because the information on that page seems to be up to date to 2011-2. Java is now on version 26, BTW.

      2 replies →

    • Slightly off topic -- java-related wiki pages are notoriously bad and possibly biased for some reason. They are laughably outdated and have a bunch of non-objective sentences that paint a much worse picture of the language than deserved.

      I have even tried removing/rewriting some of the questionable sentences but my edits weren't accepted.

  • We compiled one of our Java app to native binary using GraalVM (for encyption and secret managment needs). Side effect is the Java native binary performance is excellent, app startup time also significantly less compared to JVM version.

    I am not sure how it compares with C++, Rust and Zig, but we made a benchmark with a similar Go binary, Java native version performance (load tests) is similar to Go binary. Only RAM usage of Java native binary is 3 times to Go binary (and JVM app took almost 10 times more RAM than Go version).

    • The RAM difference is primarily because both Native Image (what you call Graal VM) and Go use much simpler and less efficient memory management techniques. HotSpot uses much more RAM by design as there are inefficiencies caused by using too little of it. Memory management - and especially very sophisticated approaches that are only used by the best resourced teams - is an especially misunderstood aspect.

      I gave a talk on the subject that I hope will be published soon, and while I can't reproduce it here, let me give an example that offers some basic intuition. Imagine needing to do some computation in two ways on a machine with 1GB of free RAM. You could run for 10s, taking up 100% CPU and consuming 80MB of RAM, or for 9s, taking up 100% CPU and consuming 800MB of RAM. The second is more efficient, despite taking up 10x more RAM and saving "only" 10% of CPU, regardless of the relative cost of RAM and CPU. This is because taking up 100% of the CPU effectively captures 100% of RAM (as no other program can use it), so both programs capture the entire 1GB only the second one captures it for a second less. This scales to non extreme situations because accessing RAM requires CPU, so using CPU means capturing RAM whether you use it or not. So HotSpot uses it if it can use it to balance the CPU utilisation.

      In some situations it may not matter, and I assume that if Native Image and Go work just as well for you, then the workload isn't very high, but under high workloads, this can matter a lot.