← Back to context

Comment by mavis

1 day ago

Switching to jemalloc instantly fixed an irksome memory leak in an embedded Linux appliance I inherited many moons ago. Thank you je, we salute you!

That’s because sane allocators that aren’t glibc will return unused memory periodically to the OS while glibc prefers to permanently retain said memory.

  • glibc will return memory to the OS just fine, the problem is that its arena design is extremely prone to fragmentation, so you end up with a bunch of arenas which are almost but not quite empty and can't be released, but can’t really be used either.

    In fact, Jason himself (the author of jemalloc and TFA) posted an article on glibc malloc fragmentation 15 years ago: https://web.archive.org/web/20160417080412/http://www.canonw...

    And it's an issue to this day: https://blog.arkey.fr/drafts/2021/01/22/native-memory-fragme...

    • glibc does NOT return memory to the OS just fine.

      In my experience it delays it way too much, causing memory overuse and OOMs.

      I have a Python program that allocates 100 GB for some work, free()s it, and then calls a subprocess that takes 100 GB as well. Because the memory use is serial, it should fit in 128 GB just fine. But it gets OOM-killed, because glibc does not turn the free() into an munmap() before the subprocess is launched, so it needs 200 GB total, with 100 GB sitting around pointlessly unused in the Python process.

      This means if you use glibc, you have no idea how much memory your system will use and whether they will OOM-crash, even if your applications are carefully designed to avoid it.

      Similar experience: https://sourceware.org/bugzilla/show_bug.cgi?id=14827

      Open since 13 years ago. This stuff doesn't seem to get fixed.

      The fix in general is to use jemalloc with

          MALLOC_CONF="retain:false,muzzy_decay_ms:0,dirty_decay_ms:0"
      

      which tells it to immediately munmap() at free().

      So in jemalloc, the settings to control this behaviour seem to actually work, in contrast to glibc malloc.

      (I'm happy to be proven wrong here, but so far no combination of settings seem to actually make glibc return memory as written in their docs.)

      From this perspective, it is frightening to see the jemalloc repo being archived, because that was my way to make sure stuff doesn't OOM in production all the time.

      1 reply →

  • Can you elaborate on this? I don't know much about allocators.

    How would the allocator know that some block is unused, short of `free` being called? Does glibc not return all memory after a `free`? Do other allocators do something clever to automatically release things? Is there just a lot of bookkeeping overhead that some allocators are better at handling?

    • They're not really correct, glibc will return stuff back to the OS. It just has some quirks about how and when it does it.

      First, some background: no allocator will return memory back to the kernel for every `free`. That's for performance and memory consumption reasons: the smallest unit of memory you can request from and return to the kernel is a page (typically 4kiB or 16kiB), and requesting and returning memory (typically called "mapping" and "unmapping" memory in the UNIX world) has some performance overhead.

      So if you allocate space for one 32-byte object for example, your `malloc` implementation won't map a whole new 4k or 16k page to store 32 bytes. The allocator probably has some pages from earlier allocations, and it will make space for your 32-byte allocation in pages it has already mapped. Or it can't fit your allocation, so it will map more pages, and then set aside 32 bytes for your allocation.

      This all means that when you call `free()` on a pointer, the allocator can't just unmap a page immediately, because there may be other allocations on the same page which haven't been freed yet. Only when all of the allocations which happen to be on a specific page are freed, can the page be unmapped. In a worst-case situation, you could in theory allocate and free memory in such a way that you end up with 100 1-byte allocations allocated across 100 pages, none of which can be unmapped; you'd be using 400kiB or 1600kiB of memory to store 100 bytes. (But that's not necessarily a huge problem, because it just means that future allocations would probably end up in the existing pages and not increase your memory consumption.)

      Now, the glibc-specific quirk: glibc will only ever unmap the last page, from what I understand. So you can allocate megabytes upon megabytes of data, which causes glibc to map a bunch of pages, then free() every allocation except for the last one, and you'd end up still consuming many megabytes of memory. Glibc won't unmap those megabytes of unused pages until you free the allocation that sits in the last page that glibc mapped.

      This typically isn't a huge deal; yes, you're keeping more memory mapped than you strictly need, but if the application needs more memory in the future, it'll just re-use the free space in all the pages it has already mapped. So it's not like those pages are "leaked", they're just kept around for future use.

      It can sometimes be a real problem though. For example, a program could do a bunch of memory-intensive computation on launch requiring gigabytes of memory at once, then all that computation culminates in one relatively small allocated object, then the program calls free() on all the allocations it did as part of that computation. The application could potentially keep around gigabytes worth of pages which serve no purpose but can't be unmapped due to that last small allocation.

      If any of this is wrong, I would love to be corrected. This is my current impression of the issue but I'm not an authoritative source.

    • When `free()` is called, the allocator internally marks that specific memory area as unused, but it doesn't necessarily return that area back to the OS, for two main reasons:

      1. `malloc()` is usually called with sizes smaller than the sizes by which the allocator requests memory from the OS, which are at least page-sized (4096 bytes on x86/x86-64) and often much larger. After a `free()`, the freed memory can't be returned to the OS because it's only a small chunk in a larger OS allocation. Only after all memory within a page has been `free()`d, the allocator may, but doesn't have to, return that page back to the OS.

      2. After a `free()`, the allocator wants to hang on to that memory area because the next `malloc()` is sure to follow soon.

      This is a very simplified overview, and different allocators have different strategies for gathering new `malloc()`s in various areas and for returning areas back to the OS (or not).