← Back to context

Comment by aw1621107

3 days ago

> It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions)

While the bugs you describe are indeed things that aren't directly addressed by Rust's borrow checker, I think the article covers more ground than your comment implies.

For example, a significant portion (most?) of the article is simply analyzing the gathered data, like grouping bugs by subsystem:

    Subsystem        Bug Count  Avg Lifetime
    drivers/can      446        4.2 years
    networking/sctp  279        4.0 years
    networking/ipv4  1,661      3.6 years
    usb              2,505      3.5 years
    tty              1,033      3.5 years
    netfilter        1,181      2.9 years
    networking       6,079      2.9 years
    memory           2,459      1.8 years
    gpu              5,212      1.4 years
    bpf              959        1.1 years

Or by type:

    Bug Type         Count  Avg Lifetime  Median
    race-condition   1,188  5.1 years     2.6 years
    integer-overflow 298    3.9 years     2.2 years
    use-after-free   2,963  3.2 years     1.4 years
    memory-leak      2,846  3.1 years     1.4 years
    buffer-overflow  399    3.1 years     1.5 years
    refcount         2,209  2.8 years     1.3 years
    null-deref       4,931  2.2 years     0.7 years
    deadlock         1,683  2.2 years     0.8 years

And the section describing common patterns for long-lived bugs (10+ years) lists the following:

> 1. Reference counting errors

> 2. Missing NULL checks after dereference

> 3. Integer overflow in size calculations

> 4. Race conditions in state machines

All of which cover more ground than listed in your comment.

Furthermore, the 19-year-old bug case study is a refcounting error not related to highly concurrent state machines or hardware assumptions.

It depends what they mean by some of these: are the state machine race conditions logic races (which Rust won’t trivially solve) or data races? If they are data races, are they the kind of ones that Rust will catch (missing atomics/synchronization) or the ones it won’t (bad atomic orderings, etc.).

It’s also worth noting that Rust doesn’t prevent integer overflow, and it doesn’t panic on it by default in release builds. Instead, the safety model assumes you’ll catch the overflowed number when you use it to index something (a constant source of bugs in unsafe code).

I’m bullish about Rust in the kernel, but it will not solve all of the kinds of race conditions you see in that kind of context.

  • > are the state machine race conditions logic races (which Rust won’t trivially solve) or data races? If they are data races, are they the kind of ones that Rust will catch (missing atomics/synchronization) or the ones it won’t (bad atomic orderings, etc.).

    The example given looks like a generalized example:

        spin_lock(&lock);
        if (state == READY) {
            spin_unlock(&lock);
            // window here where another thread can change state
            do_operation();  // assumes state is still READY
        }
    

    So I don't think you can draw strong conclusions from it.

    > I’m bullish about Rust in the kernel, but it will not solve all of the kinds of race conditions you see in that kind of context.

    Sure, all I'm trying to say is that "the class of bugs described here" covers more than what was listed in the parentheses.

    • The default Mutex struct in Rust makes it impossible to modify the data it protects without holding the lock.

      "Each mutex has a type parameter which represents the data that it is protecting. The data can only be accessed through the RAII guards returned from lock and try_lock, which guarantees that the data is only ever accessed when the mutex is locked."

      Even if used with more complex operations, the RAII approach means that the example you provided is much less likely to happen.

    • I'd argue, that while null ref and those classes of bugs may decrease, logic errors will increase. Rust is not an extraordinary readable language in my opinion, especially in the kernel where the kernel has its own data structures. IMHO Apple did it right in their kernel stack, they have a restricted subset of C++ that you can write drivers with.

      Which is also why in my opinion Zig is much more suitable, because it actually addresses the readability aspect without bring huge complexity with it.

      14 replies →

  • I don’t think that the parent comment is saying all of the bugs would have been prevented by using Rust.

    But in the listed categories, I’m equally skeptical that none of them would have benefited from Rust even a bit.

    • That’s not my point - just that “state machine races” is a too-broad category to say much about how Rust would or wouldn’t help.

> Furthermore, the 19-year-old bug case study is a refcounting error

It always surprised me how the top-of-the line analyzers, whether commercial or OSS, never really implemented C-style reference count checking. Maybe someone out there has written something that works well, but I haven’t seen it.