Comment by aw1621107
4 days ago
> It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions)
While the bugs you describe are indeed things that aren't directly addressed by Rust's borrow checker, I think the article covers more ground than your comment implies.
For example, a significant portion (most?) of the article is simply analyzing the gathered data, like grouping bugs by subsystem:
Subsystem Bug Count Avg Lifetime
drivers/can 446 4.2 years
networking/sctp 279 4.0 years
networking/ipv4 1,661 3.6 years
usb 2,505 3.5 years
tty 1,033 3.5 years
netfilter 1,181 2.9 years
networking 6,079 2.9 years
memory 2,459 1.8 years
gpu 5,212 1.4 years
bpf 959 1.1 years
Or by type:
Bug Type Count Avg Lifetime Median
race-condition 1,188 5.1 years 2.6 years
integer-overflow 298 3.9 years 2.2 years
use-after-free 2,963 3.2 years 1.4 years
memory-leak 2,846 3.1 years 1.4 years
buffer-overflow 399 3.1 years 1.5 years
refcount 2,209 2.8 years 1.3 years
null-deref 4,931 2.2 years 0.7 years
deadlock 1,683 2.2 years 0.8 years
And the section describing common patterns for long-lived bugs (10+ years) lists the following:
> 1. Reference counting errors
> 2. Missing NULL checks after dereference
> 3. Integer overflow in size calculations
> 4. Race conditions in state machines
All of which cover more ground than listed in your comment.
Furthermore, the 19-year-old bug case study is a refcounting error not related to highly concurrent state machines or hardware assumptions.
It depends what they mean by some of these: are the state machine race conditions logic races (which Rust won’t trivially solve) or data races? If they are data races, are they the kind of ones that Rust will catch (missing atomics/synchronization) or the ones it won’t (bad atomic orderings, etc.).
It’s also worth noting that Rust doesn’t prevent integer overflow, and it doesn’t panic on it by default in release builds. Instead, the safety model assumes you’ll catch the overflowed number when you use it to index something (a constant source of bugs in unsafe code).
I’m bullish about Rust in the kernel, but it will not solve all of the kinds of race conditions you see in that kind of context.
> are the state machine race conditions logic races (which Rust won’t trivially solve) or data races? If they are data races, are they the kind of ones that Rust will catch (missing atomics/synchronization) or the ones it won’t (bad atomic orderings, etc.).
The example given looks like a generalized example:
So I don't think you can draw strong conclusions from it.
> I’m bullish about Rust in the kernel, but it will not solve all of the kinds of race conditions you see in that kind of context.
Sure, all I'm trying to say is that "the class of bugs described here" covers more than what was listed in the parentheses.
The default Mutex struct in Rust makes it impossible to modify the data it protects without holding the lock.
"Each mutex has a type parameter which represents the data that it is protecting. The data can only be accessed through the RAII guards returned from lock and try_lock, which guarantees that the data is only ever accessed when the mutex is locked."
Even if used with more complex operations, the RAII approach means that the example you provided is much less likely to happen.
I'd argue, that while null ref and those classes of bugs may decrease, logic errors will increase. Rust is not an extraordinary readable language in my opinion, especially in the kernel where the kernel has its own data structures. IMHO Apple did it right in their kernel stack, they have a restricted subset of C++ that you can write drivers with.
Which is also why in my opinion Zig is much more suitable, because it actually addresses the readability aspect without bring huge complexity with it.
14 replies →
I don’t think that the parent comment is saying all of the bugs would have been prevented by using Rust.
But in the listed categories, I’m equally skeptical that none of them would have benefited from Rust even a bit.
That’s not my point - just that “state machine races” is a too-broad category to say much about how Rust would or wouldn’t help.
> It’s also worth noting that Rust doesn’t prevent integer overflow
Add a single line to a single file and you get that enforced.
https://rust-lang.github.io/rust-clippy/stable/index.html#ar...
Why doesn't it surprise me that the CAN bus driver bugs have the longest average lifetime?
> Furthermore, the 19-year-old bug case study is a refcounting error
It always surprised me how the top-of-the line analyzers, whether commercial or OSS, never really implemented C-style reference count checking. Maybe someone out there has written something that works well, but I haven’t seen it.