Comment by cb321

2 years ago

If you like this article then you might also be interested in: https://github.com/c-blake/bst

It's ANSI C with a bespoke macro generics system not Go, does not cover as many balancing rules, and is not written up as nicely. It does do something only weakly represented in your Go impls, AFAICT - "binary symmetry" - the reflection about left-right intrinsic to most ideas in this area. (Mehlhorn has a book that does this, but that is the only other source I know.) It also has API considerations like seek-edit separation and how to handle in-tree duplicate keys (FIFO/LIFO/etc. as opposed to values which are also collections).

Also, Re: B-trees among several comments here - edit heavy B-trees with very large nodes (driven by IO bandwidth-delay products) need some "mini-scalable" structure for their nodes since shifting (on average) half the entries can cost. That mini-scale could be another B-tree or it could be a binary search tree, perhaps adjusted to have 2-byte sized pointers into the little arena that is a node if 64Knode is enough. I once heard Sybase (now Microsoft SQL Server?) used skip lists. Anyway, this may be a remaining use case for binary trees even in the presence of a B-tree.

Thank you for sharing this resource, I was not aware of it. I am happy to see the inclusion of LBSTs there too.

Re: binary symmetry, if I'm understanding correctly, another author that makes use of the symmetry is Ben Pfaff in libavl [1]. At the top of [2], which seems a bit misplaced now, I wrote:

>A choice was made to not unify the symmetric cases using the direction-based technique of Ben Pfaff and others because it makes the logic more difficult to follow even though there would be less code overall.

The choice of Go was to provide implementations that are both reliable to benchmark (though not as robust as C or Rust for example) but also easy to read. I would like to further reduce abstraction by decomposing common parts such that all the strategies are "one-file" references. This is then effectively the opposite of what the macro-based implementation achieves. Both have value, of course.

[1] https://adtinfo.org/libavl.html/BST-Node-Structure.html

[2] https://github.com/rtheunissen/bst/blob/main/trees/avl_botto...

  • LBSTs are pretty rare, too. :) Re: symmetry reasoning troubles, another benefit is that it enforces the symmetry algebraically rather than relying on code edits maintaining it (& multiplying it). As to DRY vs. reducing abstraction aka explicitness, I favor the former but, yeah, people are indeed passionate in both directions. E.g., see Java. :-)

    Along that abstraction line, it perhaps bears emphasizing that it is not all subjective. E.g., being able to have 1-byte or 2-byte pointers can really shrink the per node space overhead from 16B to 2..4B, possibly total node size from 20B to 8B for a 4B payload (like a float32) or 2.5x overall space saving, an objective and interesting amount that might well keep a working set resident in faster memory. Indeed, even allowing any number of bits like 20 bits*2 is thinkable. Of course, that limits the number of nodes to the address space of the numbers, but that can be ok (such as inside 1 B-tree node or when populations are elsewise easily bounded). But then you pretty much must abstract "allocation/dereferencing" as done in that generic C package or analogously. (Others elsethread were discussing memory overheads vs. B-trees, but not at this API level and more related to tree/indirection depth in uncached worst cases.)

    Anyway, I just thought that package might provide color along some other less traveled roads in this space..was not trying to direct or redirect your own effort. It's a nice write up of what you have so far.

    • That was how I received your feedback. :)

      My inclination towards lower abstraction in this project is entirely for the sake of reading and reference, to minimize the need for the reader to re-compose from various components split across files.

      During development, abstraction helps because it makes prototyping faster and more consistent, but once everything is mostly "done", it can help the reader/student to maximize local reasoning.

      Another comment mentioned they found it difficult to find the LBST implementation - this is exactly the sort of experience I hope to avoid.