"Unsafe code
Ropey uses unsafe code to help achieve some of its space and performance characteristics. Although effort has been put into keeping the unsafe code compartmentalized and making it correct, please be cautious about using Ropey in software that may face adversarial conditions.
Auditing, fuzzing, etc. of the unsafe code in Ropey is extremely welcome. If you find any unsoundness, please file an issue! Also welcome are recommendations for how to remove any of the unsafe code without introducing significant space or performance regressions, or how to compartmentalize the unsafe code even better."
I assume your commentary is that this is bad, but I'd like to know why. I see this criticism thrown at lots of libraries.
All safe code in rust is built on unsafe code. The standard library is full of unsafe code. The purpose of `unsafe` is to encourage that dangerous things are safely wrapped. In business logic I'd question using unsafe code directly, but in a performance critical low level memory management library that's exactly where I'd expect to see it.
Yes, and this means that for me to trust that the code is memory safe I need to trust the people who develop the standard library (or validate the unsafe usage myself). Rust has a good track record and a very good review process to ensure correctness of their "unsafe" block.
This library however? Do they know how to write "unsafe" blocks? I don't know! Maybe? If there were zero uses of "unsafe" in this library I would be able to adopt it without worrying about memory safety at all. In addition, I'm not that good at knowing whether an "unsafe" block is safe myself. It's not like I can review this cases myself and be confident.
(Memory safety is of course not everything, but bugs related to memory safety are much more annoying than other types of bugs.)
That's such a weird disclaimer, considering that an overwhelming majority of mission-critical software is written entirely in "unsafe" code (that is, C/C++).
"Please be cautious about using Linux/macOS/Windows/Firefox/Chrome/Safari in adversarial conditions." I've never read a statement like that, even though it would be more warranted than in this case.
And even unsafe Rust is far safer than C and C++. It still provides automatic memory management by default, the thread safety guarantees that come with ownership, and abstraction mechanics that make it harder to commit blunders that can lead to unsafety.
Rust is missing an abstraction over non-contiguous chunks of contiguous allocations of data that would make handling ropes seamless and more natural even for smaller sizes.
C# has the concept of “Sequences” which is basically a generalization of a deque with associated classes and apis such as ReadOnlySequence and SequenceReader to encourage reduced allocations, reuse of existing buffers/slices even for composition, etc
Knowing the rust community, I wouldn’t be surprised if there’s already an RFC for something like this.
In general this sort of structure is the sort of thing I'd expect to see in an external crate in rust, not the standard library. So it's unlikely there's any RFCs, and more likely there's a few competing implementations lying around.
Bytes is essentially multiple slices over a optimistically single contiguous arc buffer. It's basically the inverse of what the root comment is after (an array of buffers). It's a rather strange crate because network IO doesn't actually need contiguous memory.
std does actually have a vague version of what the root comment wants: https://doc.rust-lang.org/std/io/struct.IoSlice.html and its sibling IoSliceMut (slicing, appending, inserting, etc. is out of scope for both - so not usable for rope stuff)
I wrote a utf-8 capable (but also fully generic over element type) rope implementation in Rust last fall (edit: 2023) and the main issue I ran into was the lack of a suitable regex library capable of working across slice boundaries. With some finagling I did manage to get it to work with most/all of the other relevant iterator/reader traits IIRC, and it benchmarked fairly well from a practical perspective, though it's not as fast as some of the other explicitly performance-focused implementations out there.
I'm afraid I might not have that much free time again for a long time, but maybe when I do, somebody will have solved the regex issue for me...
Hmm. It's similar to, but not fully, a `BufRead`? Maybe a `BufRead + Seek`. The slicing ability isn't really covered by those traits, though, but I think you could wrap a BufRead+Seek in something that effectively slices it.
A `BufRead + Seek` need not be backed by memory, though, except in the midst of being read. (A buffered normal file implements `BufRead + Seek`, for example.)
I feel like either Iterator or in some rare case of requiring generic indexing, Index, are more important than "it is composed of some number of linked memory allocations"?
A ReadOnlySequence seems to imply a linked-list of memory sections though; I'm not sure a good rope is going to be able to non-trivially interface with that, since the rope is a tree; walking the nodes in sequence is possible, but it's a tree walk, and something like ReadOnlySequenceSegment::Next() is then a bit tricky. (You could gather the set of nodes into an array ahead of time, but now merely turning it into that is O(nodes) which is sad.)
(And while it might be tempting to say "have the leaf nodes be a LL", I don't think you want to, as it means that inserts need to adjust those links, and I think you would rather have mutations produce a cheaply made but entirely new tree, which I don't think permits a LL of the leafs. You want this to make undo/redo cheap: it's just "go back to the last rope", and then all the ropes share the underlying character data that's not changing rope to rope. The rope in the OP seems to support this: "Cloning ropes is extremely cheap. Rope clones share data,")
I hadn't heard of rope data structures until I read about the xi editor (also written in Rust) a few years ago, but it looks like that's been discontinued.
"I want to build an editor, but first I must solve rendering 2D graphics purely on the GPU, invent a parallelizable path solver, and code a human perception-based color value manipulation library."
How would you associate non-character data with ranges of characters, such as for syntax coloring, semantic links, and references to points in the text?
(I couldn't find a mention of this in the README, design.md, or examples.)
In Emacs buffers, the concepts include text properties, overlays, and markers.
But, within this API, is there any support for the associations with non-character data?
For example, if you delete some text these Ropey data structure, does Ropey have facilities to update the associated non-character data (such as deleting all or part of one or more chunks of the non-character data, and/or updating positional information)? Or do you have to do that separately outside of Ropey?
Can someone explain to me why ropes are better than RRB trees? I havent implemented ropes in a long time, but I remember that there was very little performance benefits over gap buffers even for things like multiple cursors until the document became 10+ mb.
I have always thought that a text editor using rrb-trees would probably be the easiest option that would ensure decent performance for small files, and good performance for large files while also being great for random access or linear search.
I am interested in this. At my job we have shared systems where engineers often open very large files (10+GB) using vim/gvim and it loads the entire thing into memory and often means these servers are memory starved. Would ropey help in these situations?
I am not sure, I have looked into whether vim mem maps files or not over the years and get conflicting answers. Some say no, some say newer versions do, or it depends file type (binary vs text) or it depends on what plugins are enabled etc. All I know is I often see gvim processes using tons of rss memory.
I imagine that text template renders could benefit. The general use case is any time you need to repeatedly splice (take slices, insert data in the middle etc) a text of significant length and performance is important. (I heard of this concept a long time ago, but rarely think about it.)
We use them pretty heavily in realtime collaborative editing libraries for text. Ie, text CRDTs. In diamond types, merging a branch into another branch requires (essentially) replaying a whole lot of edit operations. Using a rope makes that fast.
What text editor do you use where loading a large text file doesn't put that entire text file in memory? In a world where even sub-$100 devices have gigabytes of memory, this doesn't seem even remotely problematic.
FWIW, I use Helix as my main editor and every time it has crashed (probably a few dozen times over a year or two, I've filed issues), it's related to bad text position stuff, where it effectively goes "out of bounds" on the text data structure.
I think its mostly due to multiple buffers showing the same content, as opposed to this Ropey library directly.
i hope someone can use this to create an editor similar to notepad++ that is cross platform. I have not found an editor that can handle large files as well as notepad++ on non windows systems. Last I looked into this, the issue was lack of low level libraries to handle large files.
Also, I have to wonder when this fad of loudly announcing when something is written in Rust will finally come to pass. Maybe software written in other languages should loudly announce it in every second sentence? To me at least it's become as self-aggrandizingly cringe as "Sent from my iPhone" at the end of every email...
I get that of course, but on the other hand I'm sure you know what I'm getting at too: users of certain languages, platforms etc feel the need to announce it as a point of pride, or feature in itself, and frequently. Every language will have this to some degree (with the possible exception of COBOL, lol), but there are definite outliers.
I think you've just become primed to reflex on seeing the word rust. The Readme has only 2 uses of the word rust: the header, and in the features. One tells you the language the library is for, the other is used as context for a type.
Think about the implications of the language though. When something is written in rust, management of memory is safer, multi threaded applications are safer, etc... (due to the nature of the language). If something is written in C++, the developer might be more inclined to review the code and tests to ensure proper handling of memory as well as determining if (when there is) non-deterministic behavior is safe. Hence, when highlighting something is written in Rust, it might not be just for the buzzword but also for something like developer confidence.
I for one am fed up with people advertising that they are using rope for their string implementation in an editor or editor-adjacent library. Ugh. We get it. Niche datastructure, woah so cool. You really went above and beyond and read that appendix to your data structures and algorithms textbook...!
Is it advertised a lot? I know the basic idea of Ropes and that it's used for scripting languages a lot but I haven't seen it in ads. Maybe I haven't looked at enough new editors though.
From the Readme:
"Unsafe code Ropey uses unsafe code to help achieve some of its space and performance characteristics. Although effort has been put into keeping the unsafe code compartmentalized and making it correct, please be cautious about using Ropey in software that may face adversarial conditions.
Auditing, fuzzing, etc. of the unsafe code in Ropey is extremely welcome. If you find any unsoundness, please file an issue! Also welcome are recommendations for how to remove any of the unsafe code without introducing significant space or performance regressions, or how to compartmentalize the unsafe code even better."
I assume your commentary is that this is bad, but I'd like to know why. I see this criticism thrown at lots of libraries.
All safe code in rust is built on unsafe code. The standard library is full of unsafe code. The purpose of `unsafe` is to encourage that dangerous things are safely wrapped. In business logic I'd question using unsafe code directly, but in a performance critical low level memory management library that's exactly where I'd expect to see it.
> The standard library is full of unsafe code.
Yes, and this means that for me to trust that the code is memory safe I need to trust the people who develop the standard library (or validate the unsafe usage myself). Rust has a good track record and a very good review process to ensure correctness of their "unsafe" block.
This library however? Do they know how to write "unsafe" blocks? I don't know! Maybe? If there were zero uses of "unsafe" in this library I would be able to adopt it without worrying about memory safety at all. In addition, I'm not that good at knowing whether an "unsafe" block is safe myself. It's not like I can review this cases myself and be confident.
(Memory safety is of course not everything, but bugs related to memory safety are much more annoying than other types of bugs.)
That's such a weird disclaimer, considering that an overwhelming majority of mission-critical software is written entirely in "unsafe" code (that is, C/C++).
"Please be cautious about using Linux/macOS/Windows/Firefox/Chrome/Safari in adversarial conditions." I've never read a statement like that, even though it would be more warranted than in this case.
And even unsafe Rust is far safer than C and C++. It still provides automatic memory management by default, the thread safety guarantees that come with ownership, and abstraction mechanics that make it harder to commit blunders that can lead to unsafety.
Those are well-established languages. Rust's only selling point is its alleged safety
2 replies →
Rust is missing an abstraction over non-contiguous chunks of contiguous allocations of data that would make handling ropes seamless and more natural even for smaller sizes.
C# has the concept of “Sequences” which is basically a generalization of a deque with associated classes and apis such as ReadOnlySequence and SequenceReader to encourage reduced allocations, reuse of existing buffers/slices even for composition, etc
Knowing the rust community, I wouldn’t be surprised if there’s already an RFC for something like this.
I think you might be looking for the bytes crate, which is pretty widely used in networking code: https://docs.rs/bytes/latest/bytes/index.html
In general this sort of structure is the sort of thing I'd expect to see in an external crate in rust, not the standard library. So it's unlikely there's any RFCs, and more likely there's a few competing implementations lying around.
Bytes is essentially multiple slices over a optimistically single contiguous arc buffer. It's basically the inverse of what the root comment is after (an array of buffers). It's a rather strange crate because network IO doesn't actually need contiguous memory.
std does actually have a vague version of what the root comment wants: https://doc.rust-lang.org/std/io/struct.IoSlice.html and its sibling IoSliceMut (slicing, appending, inserting, etc. is out of scope for both - so not usable for rope stuff)
8 replies →
I wrote a utf-8 capable (but also fully generic over element type) rope implementation in Rust last fall (edit: 2023) and the main issue I ran into was the lack of a suitable regex library capable of working across slice boundaries. With some finagling I did manage to get it to work with most/all of the other relevant iterator/reader traits IIRC, and it benchmarked fairly well from a practical perspective, though it's not as fast as some of the other explicitly performance-focused implementations out there.
I'm afraid I might not have that much free time again for a long time, but maybe when I do, somebody will have solved the regex issue for me...
There is a crate that can handle regex over non contiguous data. It had some weak points, but overall is really good
https://crates.io/crates/regex-cursor
1 reply →
Hmm. It's similar to, but not fully, a `BufRead`? Maybe a `BufRead + Seek`. The slicing ability isn't really covered by those traits, though, but I think you could wrap a BufRead+Seek in something that effectively slices it.
A `BufRead + Seek` need not be backed by memory, though, except in the midst of being read. (A buffered normal file implements `BufRead + Seek`, for example.)
I feel like either Iterator or in some rare case of requiring generic indexing, Index, are more important than "it is composed of some number of linked memory allocations"?
A ReadOnlySequence seems to imply a linked-list of memory sections though; I'm not sure a good rope is going to be able to non-trivially interface with that, since the rope is a tree; walking the nodes in sequence is possible, but it's a tree walk, and something like ReadOnlySequenceSegment::Next() is then a bit tricky. (You could gather the set of nodes into an array ahead of time, but now merely turning it into that is O(nodes) which is sad.)
(And while it might be tempting to say "have the leaf nodes be a LL", I don't think you want to, as it means that inserts need to adjust those links, and I think you would rather have mutations produce a cheaply made but entirely new tree, which I don't think permits a LL of the leafs. You want this to make undo/redo cheap: it's just "go back to the last rope", and then all the ropes share the underlying character data that's not changing rope to rope. The rope in the OP seems to support this: "Cloning ropes is extremely cheap. Rope clones share data,")
Is buf-list[0] what you're describing?
[0]: https://crates.io/crates/buf-list
There's also a really nice implementation of Rope for C# here: https://github.com/FlatlinerDOA/Rope
Vec<Vec<T>>?
I hadn't heard of rope data structures until I read about the xi editor (also written in Rust) a few years ago, but it looks like that's been discontinued.
https://github.com/xi-editor/xi-editor
The authors of Xi are currently working on Xilem, an experimental reactive UI framework for Rust: https://github.com/linebender/xilem
In the announcement post, they mention that work on Xi is considered "on hold" rather than strictly discontinued: https://raphlinus.github.io/rust/gui/2022/05/07/ui-architect...
Legendary-tier yak shaving.
"I want to build an editor, but first I must solve rendering 2D graphics purely on the GPU, invent a parallelizable path solver, and code a human perception-based color value manipulation library."
6 replies →
Repo says "discontinued".
8 replies →
Zed uses something similar to ropes as well:
https://zed.dev/blog/zed-decoded-rope-sumtree
Zed's Sum Tree is my favorite datastructure ever and is the future of database indexes.
2 replies →
Zed seems to be a gui-oriented editor here: https://zed.dev/
1 reply →
Is it even possible to write any text editor without some form of rope data structure?
Here's a paper reviewing the various choices, that is often mentioned in discussions around data structures for text editors:
https://www.cs.unm.edu/~crowley/papers/sds.pdf
VSCode uses a piece table (https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...).
1 reply →
Gap buffers are the other classic option, and there are others too, e.g. piece tables.
Most certainly: gap buffers, piece tables, and line arrays are also popular choices.
How would you associate non-character data with ranges of characters, such as for syntax coloring, semantic links, and references to points in the text?
(I couldn't find a mention of this in the README, design.md, or examples.)
In Emacs buffers, the concepts include text properties, overlays, and markers.
That would depend on your editor's implementation.
But, within this API, is there any support for the associations with non-character data?
For example, if you delete some text these Ropey data structure, does Ropey have facilities to update the associated non-character data (such as deleting all or part of one or more chunks of the non-character data, and/or updating positional information)? Or do you have to do that separately outside of Ropey?
7 replies →
I'm sorry, it's only vaguely related, but maybe someone can share some ideas.
What would be some good use-cases for using Ropey with Emacs? Maybe re-formatting/beautifying huge json files or something like that?
I didn't have time yet to explore the project more closely, but it looks very interesting.
What a perfect readme.
Kudos to the author.
No need for sarcasm, maybe target auditory already knows what 'rope' is.
No sarcasm, it was good enough to compliment
Why do you assume sarcasm? I thought the readme was unsarcastically excellent as well.
Can someone explain to me why ropes are better than RRB trees? I havent implemented ropes in a long time, but I remember that there was very little performance benefits over gap buffers even for things like multiple cursors until the document became 10+ mb.
I have always thought that a text editor using rrb-trees would probably be the easiest option that would ensure decent performance for small files, and good performance for large files while also being great for random access or linear search.
I am interested in this. At my job we have shared systems where engineers often open very large files (10+GB) using vim/gvim and it loads the entire thing into memory and often means these servers are memory starved. Would ropey help in these situations?
No:
> On the other hand, Ropey is not good at:
> Handling texts that are larger than available memory. Ropey is an in-memory data structure.
Also, this is a rust library, not an editor application.
I am literally guessing at this point — but can they mmap those files?
(Or I mean can you shard the files/store the files more efficiently)
I am not sure, I have looked into whether vim mem maps files or not over the years and get conflicting answers. Some say no, some say newer versions do, or it depends file type (binary vs text) or it depends on what plugins are enabled etc. All I know is I often see gvim processes using tons of rss memory.
Didn't know about this data structure. What are some use cases other than text editors? The article on wikipedia [1] doesn't expand much on this.
[1] https://en.wikipedia.org/wiki/Rope_(data_structure)
I imagine that text template renders could benefit. The general use case is any time you need to repeatedly splice (take slices, insert data in the middle etc) a text of significant length and performance is important. (I heard of this concept a long time ago, but rarely think about it.)
We use them pretty heavily in realtime collaborative editing libraries for text. Ie, text CRDTs. In diamond types, merging a branch into another branch requires (essentially) replaying a whole lot of edit operations. Using a rope makes that fast.
"Handling texts that are larger than available memory. Ropey is an in-memory data structure."
That seems to make it of dubious use, not really suitable for a well-engineered text editor.
The fact that it's UTF-8 only is also a serious problem since files can contain arbitrary byte sequences.
What kind of wild use case involves editing a text file larger than modern ram?
I disagree that there are a lot of editors that can handle files not in RAM. But I'd like one. All I know of is 'less'.
What text editor do you use where loading a large text file doesn't put that entire text file in memory? In a world where even sub-$100 devices have gigabytes of memory, this doesn't seem even remotely problematic.
those gigabytes of memory on sub-$100 devices are usually occupied by other apps taking the same careless approach to memory, so it's very problematic
3 replies →
Any editors using this?
It seems like Helix is using it https://github.com/helix-editor/helix/blob/master/docs/archi...
FWIW, I use Helix as my main editor and every time it has crashed (probably a few dozen times over a year or two, I've filed issues), it's related to bad text position stuff, where it effectively goes "out of bounds" on the text data structure.
I think its mostly due to multiple buffers showing the same content, as opposed to this Ropey library directly.
2 replies →
Not that library in particular, but https://zed.dev/blog/zed-decoded-rope-sumtree
i hope someone can use this to create an editor similar to notepad++ that is cross platform. I have not found an editor that can handle large files as well as notepad++ on non windows systems. Last I looked into this, the issue was lack of low level libraries to handle large files.
It's not open source, but sublime text does well with large files (depending on your definition of large, but several GB is fine)
Helix is on Windows, right?,
You're someone. Start typing.
Author is perhaps better known for his really great path tracer (and attendant blog), Psychopath: https://github.com/cessen/psychopath
Also, I have to wonder when this fad of loudly announcing when something is written in Rust will finally come to pass. Maybe software written in other languages should loudly announce it in every second sentence? To me at least it's become as self-aggrandizingly cringe as "Sent from my iPhone" at the end of every email...
When it’s a library of code, the language it is written in is pretty pertinent information as that’s the language it has to be consumed from…
I get that of course, but on the other hand I'm sure you know what I'm getting at too: users of certain languages, platforms etc feel the need to announce it as a point of pride, or feature in itself, and frequently. Every language will have this to some degree (with the possible exception of COBOL, lol), but there are definite outliers.
1 reply →
I think you've just become primed to reflex on seeing the word rust. The Readme has only 2 uses of the word rust: the header, and in the features. One tells you the language the library is for, the other is used as context for a type.
Think about the implications of the language though. When something is written in rust, management of memory is safer, multi threaded applications are safer, etc... (due to the nature of the language). If something is written in C++, the developer might be more inclined to review the code and tests to ensure proper handling of memory as well as determining if (when there is) non-deterministic behavior is safe. Hence, when highlighting something is written in Rust, it might not be just for the buzzword but also for something like developer confidence.
When it stops baiting engagement with these sort of comments, probably.
Maybe it is already considered an achievement if someone manages to write a program in Rust.
I for one am fed up with people advertising that they are using rope for their string implementation in an editor or editor-adjacent library. Ugh. We get it. Niche datastructure, woah so cool. You really went above and beyond and read that appendix to your data structures and algorithms textbook...!
Is it advertised a lot? I know the basic idea of Ropes and that it's used for scripting languages a lot but I haven't seen it in ads. Maybe I haven't looked at enough new editors though.
Reminds me of xi.
built in rust btw