Comment by ycombinatrix

4 months ago

Could std::rc::Weak solve the backreference problem?

3 comments

ycombinatrix

Weak is very helpful in preventing ownership loops which prevent deallocation. Weak plus RefCell lets you do back pointers cleanly. You call ".borrow()" to get access to the data protected by a RefCell. The run-time borrow panics if someone else is using the data item. This prevents two mutable pointers to the same data, which Rust requires.

Static analysis could potentially check for those potential panics at compile time. If that was implemented, the run time check, and the potential for a panic, would go away. It's not hard to check, provided that all borrows have limited scope. You just have to determine, conservatively, that no two borrow scopes for the same thing overlap.

If you had that check, it would be possible to have something that behaves like RefCell, but is checked entirely at compile time. Then you know you're free of potential double-borrow panics.

I started a discussion on this on a Rust forum. A problem is that you have to perform that check after template expansion, and the Rust compiler is not set up to do global analysis after template expansion. This idea needs further development.

This check belongs to the same set of checks which prevent deadlocking a mutex against itself. There's been some work on Rust static deadlock analysis, but it's still a research topic.

josephg 4 months ago

I didn't consider that. Looking at how weak references work, that might work. It would reduce the need for raw pointers and unsafe code. But in exchange, it would add 16 bytes of overhead to every node in my data structure. That's pure overhead - since the reference count of all nodes should always be exactly 1.

However, I'm not sure what the implications are around mutability. I use a Cursor struct which stores a reference to a specific leaf node in the tree. Cursors can walk forward in the tree (cursor.next_entry()). The tree can also be modified at the cursor location (cursor.insert(item)). Modifying the tree via the cursor also updates some metadata all the way up from the leaf to the root.

If the cursor stored a Rc<Leaf> or Weak<Leaf>, I couldn't mutate the leaf item because rc.get_mut() returns None if there are other strong or weak pointers pointing to the node. (And that will always be the case!). Maybe I could use a Rc<Cell<Leaf>>? But then my pointers down the tree would need the same, and pointers up would be Weak<Cell<Leaf>> I guess? I have a headache just thinking about it.

Using Rc + Weak would mean less unsafe code, worse performance and code thats even harder to read and reason about. I don't have an intuitive sense of what the performance hit would be. And it might not be possible to implement this at all, because of mutability rules.

Switching to an array improved performance, removed all unsafe code and reduced complexity across the board. Cursors got significantly simpler - because they just store an array index. (And inserting becomes cursor.insert(item, &mut tree) - which is simple and easy to reason about.)

I really think the Vec<Node> / Vec<Leaf> approach is the best choice here. If I were writing this again, this is how I'd approach it from the start.

ycombinatrix 3 months ago

Consider copy-pasting the code from Rc/Weak, then tweaking it to suit your needs (reduce the overhead).
I have done this before with stdlib stuff like io::Cursor and was pretty happy with the result.