← Back to context

Comment by rmunn

9 hours ago

I was trying out an implementation of Relaxed Radix-Balanced Trees, a data structure for immutable lists that allows splitting and joining them in constant* time. In a normal RRB Tree implementation, each leaf in the tree has up to 32 items in it, and a tree node that's not a leaf has up to 32 pointers to leaves or to lower tree nodes. There's also a tail, a single array of up to 32 which, when full, will be promoted to a leaf and a new empty tail array created. (Or the full tail will be promoted the next time an item is appended and a new tail of length 1 will be created).

For my implementation, I was using nodes of size 4 rather than size 32, because that let me test much easier; then after testing, I would switch the branching constant to 32 and run final tests. With a branching factor of 4, that means that a tree of size 24 contained one root node with two children, a left child that was a full tree of 16 (a tree node pointing to four leaf nodes, all of them full of 4 items), and a right child that was a leaf node of size 4. That makes 20 items in the tree, plus a full tail array of 4 making 24 items. The failure came when the next item was appended.

Now, if you've never implemented an RRB tree you won't have noticed the mistake. But in the previous paragraph, when I said that the right child of the root was a leaf? That was the mistake; in an RRB tree all leaves are supposed to be at the same level, and the right child of the root should have been a one-child node, whose child was a leaf of size 4. My code that promoted the root up one level when it was full (when the tree was size 20, a full 16-item tree plus a tail of 4) was faulty, and had created a new root with the previous root as the left child and the previous tail as the right child, instead of creating a one-child node as the right child of the new root with the 4-item leaf as its child. The tree still functioned for all read-only purposes (traversal, lookups, etc) and even for appends, as long as the appends did no more than fill the tail. Once the tail was full (making a total of 24 items total stored in the structure) and I needed to append a 25th item, the tail-promotion code found an illegal tree state and did the wrong thing. I no longer remember which wrong thing it did, just that once that 25th item was appended, the resulting tree no longer passed all the property checks.

* Constant for all practical purposes. In fact it's O(log32 N), but since log32 N is no more than 7 when N is 2^32, the operation is bounded by a constant. So it's O(1) for all practical purposes: most lists of any reasonable size will translate into trees whose height is no more than 2 or 3, which means only 2 or 3 nodes will need to be split when splitting the entire list (and the RRB tree structure that models the list) into two parts. And likewise for joining two RRB structures together (a list append operation): there might be up to O(H) merged nodes where H is the height of the tree, which again no more than log32 N for a tree of branching factor 32.