Comment by londons_explore
3 years ago
Having to decide ahead of time how much memory to allocate to the arena is... crap...
The vast majority of programmers don't want arbitrary 'out of memory in arena' errors just because the user inputted slightly more things than expected. Yes, I know that modern OS's don't actually allocate memory till you use it, but when you make widespread use of that functionality, typically your reuse of address space is poor and free'd stuff will be neither reused nor returned to the OS.
Likewise, not being able to free things within the arena is also crap - I'm sure there will be plenty of times the system is running low of RAM, but hundreds of applications have thousands of arenas, all half full of never-to-be-used again items that the OS can't reuse.
The actual point to understand is that a general purpose allocator is usually "mostly crap" because it requires an incredible internal complexity to meet all requirements. It's often better to use specialized local allocators, and write code which expects an allocator to be provided instead of calling into global allocator functions. Such specialized allocators can often be a lot simpler, while being at least as fast general purpose allocators like jemalloc or mimalloc.
Very often you don't actually need to track or manage the lifetime of individual objects, since related objects are often created and destroyed at the same time. For instance when parsing a JSON file you might end up with many individual nodes which can all be discarded at the same time once the parsing result has been consumed. With an arena allocator, you just throw away the memory for all those nodes at once, instead of calling a free/deallocate/delete functions tens- or hundreds-of-thousand times.
It's straightforward to make this slightly more dynamic by keeping a linked list of arenas and just add new one when you run out of space. You get most of the benefit with only a tiny increase in complexity.
That doesn't fix your second objection, but where you want to use this tends to be where you know object lifetimes are similar anyway.
E.g. way back we loaded fonts for an embedded device using t1lib, which on loading a font made hundreds of tiny malloc calls, all of which were freed at the exact same time when the font was freed. Adding an arena allocator both sped it up, reduced memory use (less malloc overhead), and it didn't matter at all that we couldn't free things within the arena because they'd always be freed at the same time anyway.
So the takeaway from that might be that arena allocators aren't always right, but you'd be surprised how often you can predictably group allocations into sets with similar enough lifetimes it doesn't matter much. A key to this is often the trick showed in the linked article: Don't just lump everything into the same arena; use different arenas for different lifetimes. You might well find you're left with so few allocations that don't seem to fit that you can afford to just keep those around in a single long-lived arena as well.
This...depends on your application. With a lot of small memory/embedded applications, deciding how much memory you need ahead of time is how you do things and normally you don't malloc at all. Because you absolutely need to know the memory bounds of your app. In that sort of environment, this approach can be useful.
You aren't the target audience I think? The main two reasons to use custom alloctors, or the two I run into at least, is when memory allocation profiles as a significant portion of your flamegraph, and/or there is a real probability of memory fragmentation preventing new allocation even when there is plenty of space left. That latter is often the case with embedded. The former can be, and is also pretty common any time you are cpu bound as in gaming or signal processing.
> Having to decide ahead of time how much memory to allocate to the arena is... crap...
A nice solution in many cases is to just keep a bunch of std::vector<T> (or equivalent) around for the types you need, and .clear() them all at the start of each request/frame/message/whatever being processed. Calling push_back on a vector will only allocate when the vector's already reached its capacity, which will happen very rarely after the first few runs, so the hot loop will usually be allocation-free, but without needing to allocate a fixed amount of memory ahead of time.
Generally, std::vector not going to work for allocator backend.
The issue being, std::vector relocates all elements every time it increases the capacity. Therefore, adding an element to std::vector may invalidate addresses of all previously added vector elements.
You gonna have to adjust your higher-level data structures which use that allocator, replacing pointers with offsets relative to the start of that std::vector. This introduces another level of indirection, adds complexity, and in some edge cases may even ruin the performance.
std::vector is fine if you access elements by index, instead of pointer, which can be a perfectly fine approach. Of course this kills the general purpose allocator aspect, since you're not getting back pointers, only offsets.
However, if you combine that with generation numbers, you can make yourself a very handy container with stable and safe references. slotmap [1] comes to mind.
[1]: https://docs.rs/slotmap/latest/slotmap/
Yes, that's true if you need higher-level datastructures, but if your functions are just taking vectors/spans of stuff as input then it works out fine.
std::deque, although it would be nice to use an implementation where you can control the block size.
1 reply →
Your second paragraph doesn't make sense because the whole point of this type of allocation is to guarantee that you will reuse the same address space you just freed.