Comment by _a1_

5 years ago

> As an implementation strategy, we do not care about memory leak because we really can't save that much memory by doing precise memory management. It is because most objects that are allocated during an execution of mold are needed until the very end of the program. I'm sure this is an odd memory management scheme (or the lack thereof), but this is what LLVM lld does too.

The fact that someone does it wrong doesn't mean that we should do it wrong as well ;(

I don't think I like this approach. It may work now, but will probably seriously limit the possibilities in the future.

If you think this approach is wrong, could you articulate the reasons why you think it is wrong? This is a classic memory management strategy... if a program is running as a batch program, all memory will be freed when the program exits. Any alternative memory management strategy would have to free and then reuse memory in order to show improvement. If it's a small amount of memory freed, or if the memory is unlikely to be reused, the benefits of freeing memory are smaller and it may actually slow the program down.

The fact that this program can successfully link Chrome means that we have fairly solid baseline performance metrics we can use for "big" programs. Chrome is just about the largest program you might ever need to link.

  • Yeah, this is also the strategy GCC and co use generally AFAIK. In a program like GCC where a single invocation will operate over a single file/unit, there's just not much benefit to trying to re-use data; if GCC or LLVM were closer to "build servers" with persistent state that compiled and linked objects on demand then it'd make sense, but in their current model, it's easier and safer to just keep data around.

    • Another classic example is Apache's memory pool. In Apache, you allocate memory from memory pools associated with the current request or connection. Memory pools are freed as a whole when a request or a connection is complete. mold's memory management scheme is not very different from that if you think the entire linker as a single "session" which uses a single memory pool.

  • > If you think this approach is wrong, could you articulate the reasons why you think it is wrong?

    Because later if you want to reuse parts of the code in a continuous environment (e.g. a daemon), then you will be surprised that you have memory leaks all over the place (or worse, someone else will discover it by accident).

    I don't have a problem with the end-of-process-releases-all-memory optimization. But I had the impression that the author uses let's-worry-about-leaks-later-because-OS-takes-care-of-it-for-free-(in-my-use-case).

    Best approach to take would be to create a memory pool with fast allocation (e.g. TLAB allocation in Java, or how computer games do it), in order to have control over how the memory is freed or when.