Comment by jandrewrogers
9 months ago
That assertion was specifically qualified in the context of database engines, for which it is true. I definitely write bugs but I haven't seen a segfault or memory corruption in years. That is more of a C thing than a C++ thing.
It is kind of difficult to have a segfault or memory corruption with explicitly paged object memory, since there can't be any pointers and these complex objects are bound-checked at compile-time. If you care about performance and scalability, you don't need to concern yourself with multi-threading as an issue either. The main way you'd expect to see memory corruption is if you try to read/write a page in the middle of a DMA operation to the same memory, and Rust doesn't help you with that either (though this would be just a normal logic bug in the scheduler).
It is pretty easy to avoid segfaults and memory corruption in modern C++ if the software architecture doesn't allow you to create the conditions under which those are likely to occur.
So you're saying if you write your database engine in C++ you're not going to see any segfaults?
https://jira.mariadb.org/browse/MDEV-14248?jql=text%20~%20%2...
That is significantly dependent on the software architecture. MariaDB's design is not particularly modern (not a knock against MariaDB, it is an older system) and employs none of the software architecture required for high-scale and high-performance kernels that, as a side-effect, makes it difficult to accidentally create the conditions for a segfault regardless of the language. The design motivation is actually optimal dynamic resource scheduling under heavy unpredictable workloads, not memory safety. Rust's borrow checker doesn't work with these memory models, so you'll be in the same boat as C++ regardless.
I always found it theoretically interesting that schedule-based safety architectures, which are focused more on optimal resource allocation than safety per se (its all about extreme throughput traditionally), asymptotically converge on memory safety too as a practical matter for the same reason they also require almost no locking. By doing the safety analysis (many kinds, not just memory) at runtime, tiny dynamic modifications to the execution schedule are sufficient to provably (using TLA+ and similar) avoid many types of "unsafety" without the design compromises required to enable some of this analysis at compile-time. It requires a non-traditional software architecture, and it doesn't play nicely with a lot of existing code, due to the level of execution control required but I see more and more systems being designed this way at the high-end of the data infrastructure market.
Sounds interesting. What reading on such modern architectures would you recommend?
No, they're saying that you will still see segfaults if you write it in Rust, because Rust's borrow checker is unusable in that environment.
That doesn't seem to be at all what they're saying, but in any case I checked SurrealDB (biggest Rust DB I could find) and there was exactly one report of a segfault and the developers couldn't reproduce it.
As far as I can tell about 5% of mariadb bugs mention segfaults, compared to 0.2% for SurrealDB.
I mean it's fairly obvious that even if some code in a Rust database is `unsafe` because it deals with manual paging and DMA and whatever, most of the code is going to be safe code.