← Back to context

Comment by bastawhiz

21 hours ago

I didn't downvote, but I suspect it's an easy answer: the fix was like four lines.

At the end of the day, #1 and #3 both probably add a fairly significant amount of code and complexity that it's not clear to me adds robustness or clarity. From the fix:

``` // If our first node has non-standard memory size, we can't reuse // it. This is because our initBuf below would change the underlying // memory length which would break our memory free outside the pool. // It is easiest in this case to prune the node. ```

https://github.com/ghostty-org/ghostty/commit/17da13840dc71b...

#3, it seems, would require making a broader change. The size effectively is immutable now (assuming I'm understanding your comment correctly): non-standard pages never change size, they get discarded without trying to change their size.

#2 is interesting, but I think it won't work because the implementation of MemoryPool doesn't seem like it would make it easy to test ownership:

https://github.com/ghostty-org/ghostty/blob/17da13840dc71ba3...

You'd have to make some changes to be able to check the arena buffers, and that check would be far slower than the simple comparison.

Thank you. I think each of my options are pretty trivial in C. I guess what I'm not understanding for #3 is if size is immutable, how the size changed which caused the issue? The post said they changed the size of the page without changing the underlying size of the allocated memory. To me this is the big issue. There was a desync in information where the underlying assumption is that size tells you where the data came from and that the size of the metadata and the size of the allocation move in tandem across that boundary.

#1 and #2 are fixes for breaking that implicit trust. #1 still trusts the metadata, #2 is what I'd consider the most robust solution is that not only is it ideally trivial (just compare if a pointer is within a range, assuming zig can do that) but it doesn't rely on metadata being correct. #3 prevents the desync.

I really don't understand the code base enough to say definitively that my ways work, which is I guess what I'm really looking for feedback on. Looking at the memorypool, I think you're right that my assumption of it being a simple contiguous array was incorrect.

ETA: I think I'm actually very wrong for #2. Color me surprised that the zig memory pool allocated each item separately instead of as one big block. Feels like a waste, but I'm sure they have their reasons. That's addCapacity in memory_pool.zig

  • I'm not 100%, but my understanding was that the non standard pages are always larger than the standard pages. If you need more than a standard page, you always get a freshly allocated non standard page. But when one was released, it was being treated as though it was standard sized. The pool would then reuse that memory, but only at a standard size. So every released non standard page leaked the difference between what was allocated and what was standard.

    Which is to say, I don't think it was actually being resized. I think it was the metadata for the page saying it had the (incorrect) standard size (and the incorrect handling after the metadata was changed).

    • Yes that last point was what I meant. I see no reason that the metadata's size field should get updated without some realloc of the memory it points to. I think I'll need to look into the actual code to see what's going on there, though, because we may both be misunderstanding. It just seems very error prone to categorize how you free memory based on a field that by the time you get to `free` has no guaranteed relationship with where the memory came from. I think that should be fixed. What was done in the blog is more of a band-aid imo.