← Back to context

Comment by josephg

3 months ago

> Having gone full-in on this approach before, with some good success, it still feels wrong to me today. Contiguous storage may work for reasonable numbers of elements, but it's potentially blocking a huge contiguous chunk of address space especially for large numbers of elements.

That makes sense. If my btree was gigabytes in size, I might rethink the approach for a number of reasons. But in my case, even for quite large input, the data structure never gets more than a few megabytes in size. Thats small enough that resizing the vec has a negligible performance impact.

It helps that my btree stores its contents using lossless internal run-length encoding. Eg, if I have values like this:

    {key: 5, value: 'a'}
    {key: 6, value: 'a'}
    {key: 7, value: 'a'}

Then I store them like this:

    {key: [5..8), value: 'a'}

In my use case, this compaction decreases the size of the data structure by about 20x. There's some overhead in joining and splitting values - but its easily worth it.