Comment by rtheunissen

2 years ago

Thank you for sharing this resource, I was not aware of it. I am happy to see the inclusion of LBSTs there too.

Re: binary symmetry, if I'm understanding correctly, another author that makes use of the symmetry is Ben Pfaff in libavl [1]. At the top of [2], which seems a bit misplaced now, I wrote:

>A choice was made to not unify the symmetric cases using the direction-based technique of Ben Pfaff and others because it makes the logic more difficult to follow even though there would be less code overall.

The choice of Go was to provide implementations that are both reliable to benchmark (though not as robust as C or Rust for example) but also easy to read. I would like to further reduce abstraction by decomposing common parts such that all the strategies are "one-file" references. This is then effectively the opposite of what the macro-based implementation achieves. Both have value, of course.

[1] https://adtinfo.org/libavl.html/BST-Node-Structure.html

[2] https://github.com/rtheunissen/bst/blob/main/trees/avl_botto...

2 comments

rtheunissen

cb321 2 years ago

LBSTs are pretty rare, too. :) Re: symmetry reasoning troubles, another benefit is that it enforces the symmetry algebraically rather than relying on code edits maintaining it (& multiplying it). As to DRY vs. reducing abstraction aka explicitness, I favor the former but, yeah, people are indeed passionate in both directions. E.g., see Java. :-)

Along that abstraction line, it perhaps bears emphasizing that it is not all subjective. E.g., being able to have 1-byte or 2-byte pointers can really shrink the per node space overhead from 16B to 2..4B, possibly total node size from 20B to 8B for a 4B payload (like a float32) or 2.5x overall space saving, an objective and interesting amount that might well keep a working set resident in faster memory. Indeed, even allowing any number of bits like 20 bits*2 is thinkable. Of course, that limits the number of nodes to the address space of the numbers, but that can be ok (such as inside 1 B-tree node or when populations are elsewise easily bounded). But then you pretty much must abstract "allocation/dereferencing" as done in that generic C package or analogously. (Others elsethread were discussing memory overheads vs. B-trees, but not at this API level and more related to tree/indirection depth in uncached worst cases.)

Anyway, I just thought that package might provide color along some other less traveled roads in this space..was not trying to direct or redirect your own effort. It's a nice write up of what you have so far.

rtheunissen 2 years ago

That was how I received your feedback. :)
My inclination towards lower abstraction in this project is entirely for the sake of reading and reference, to minimize the need for the reader to re-compose from various components split across files.
During development, abstraction helps because it makes prototyping faster and more consistent, but once everything is mostly "done", it can help the reader/student to maximize local reasoning.
Another comment mentioned they found it difficult to find the LBST implementation - this is exactly the sort of experience I hope to avoid.