Comment by cogman10

1 year ago

Depends on how things are approached.

You could, for example, take advantage of bump arena allocator in rust which would allow the linker to have just 1 alloc/dealloc. Mold is still using more traditional allocators under the covers which won't be as fast as a bump allocator. (Nothing would stop mold from doing the same).

Traditional allocators are fast if you never introduce much fragmentation with free, though you may still get some gaps and have some other overhead and not be quite as fast. But why couldn't you just LD_PRELOAD a malloc for mold that worked as a bump/stack/arena allocator and just ignored free if anything party stuff isn't making that many allocations?

  • > Traditional allocators are fast

    Really it's allocators in general. Allocations are perceived as expensive only because they are mostly dependent on the amortized cost of prior deallocations. As an extreme example, even GCs can be fast if you avoid deallocation because most typically have a long-lived object heap that rarely gets collected - so if you keep things around that can be long-lived (pooling) their cost mostly goes away.

    • Slight disagreement here.

      Allocation is perceived as slow because it is. Getting memory from the OS is somewhat expensive because a page of memory needs to be allocated and stored off. Getting memory from traditional allocators is expensive because freespace needs to be tracked. When you say "I need 5 bytes" the allocator needs to find 5 free bytes to give back to you.

      Bump allocators are fast because the operation of "I need 5 bytes" is incrementing the allocation pointer forward by 5 bytes and maybe doing a new page allocation if that's exhausted.

      GC allocators are fast because they are generally bump allocators! The only difference is that when exhaustion happens the GC says "I need to run a GC".

      Traditional allocators are a bit slower because they are typically something like an arena with skiplists used to find free space. When you free up memory, that skiplist needs to be updated.

      But further, unlike bump and GC allocators another fault of traditional allocators is they have a tendency to scatter memory which has a negative impact on CPU cache performance. With the assumption that related memory tends to be allocated at the same time, GCs and bump allocators will colocate memory. But, because of the skiplist, traditional allocators will scattershot allocations to avoid free memory fragmentation.

      All this said, for most apps this doesn't matter a whole lot. However, if you are doing a CPU/memory intense operation then this is stuff to know.

      1 reply →