← Back to context

Comment by Fiveplus

2 days ago

Before the "rewrite it in Rust" comments take over the thread:

It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions) wouldn't necessarily be caught by the borrow checker. Rust is fantastic for memory safety, but it will not stop you from misunderstanding the spec of a network card or writing a race condition in unsafe logic that interacts with DMA.

That said, if we eliminated the 70% of bugs that are memory safety issues, the SNR ratio for finding these deep logic bugs would improve dramatically. We spend so much time tracing segfaults that we miss the subtle corruption bugs.

> It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions)

While the bugs you describe are indeed things that aren't directly addressed by Rust's borrow checker, I think the article covers more ground than your comment implies.

For example, a significant portion (most?) of the article is simply analyzing the gathered data, like grouping bugs by subsystem:

    Subsystem        Bug Count  Avg Lifetime
    drivers/can      446        4.2 years
    networking/sctp  279        4.0 years
    networking/ipv4  1,661      3.6 years
    usb              2,505      3.5 years
    tty              1,033      3.5 years
    netfilter        1,181      2.9 years
    networking       6,079      2.9 years
    memory           2,459      1.8 years
    gpu              5,212      1.4 years
    bpf              959        1.1 years

Or by type:

    Bug Type         Count  Avg Lifetime  Median
    race-condition   1,188  5.1 years     2.6 years
    integer-overflow 298    3.9 years     2.2 years
    use-after-free   2,963  3.2 years     1.4 years
    memory-leak      2,846  3.1 years     1.4 years
    buffer-overflow  399    3.1 years     1.5 years
    refcount         2,209  2.8 years     1.3 years
    null-deref       4,931  2.2 years     0.7 years
    deadlock         1,683  2.2 years     0.8 years

And the section describing common patterns for long-lived bugs (10+ years) lists the following:

> 1. Reference counting errors

> 2. Missing NULL checks after dereference

> 3. Integer overflow in size calculations

> 4. Race conditions in state machines

All of which cover more ground than listed in your comment.

Furthermore, the 19-year-old bug case study is a refcounting error not related to highly concurrent state machines or hardware assumptions.

  • It depends what they mean by some of these: are the state machine race conditions logic races (which Rust won’t trivially solve) or data races? If they are data races, are they the kind of ones that Rust will catch (missing atomics/synchronization) or the ones it won’t (bad atomic orderings, etc.).

    It’s also worth noting that Rust doesn’t prevent integer overflow, and it doesn’t panic on it by default in release builds. Instead, the safety model assumes you’ll catch the overflowed number when you use it to index something (a constant source of bugs in unsafe code).

    I’m bullish about Rust in the kernel, but it will not solve all of the kinds of race conditions you see in that kind of context.

    • > are the state machine race conditions logic races (which Rust won’t trivially solve) or data races? If they are data races, are they the kind of ones that Rust will catch (missing atomics/synchronization) or the ones it won’t (bad atomic orderings, etc.).

      The example given looks like a generalized example:

          spin_lock(&lock);
          if (state == READY) {
              spin_unlock(&lock);
              // window here where another thread can change state
              do_operation();  // assumes state is still READY
          }
      

      So I don't think you can draw strong conclusions from it.

      > I’m bullish about Rust in the kernel, but it will not solve all of the kinds of race conditions you see in that kind of context.

      Sure, all I'm trying to say is that "the class of bugs described here" covers more than what was listed in the parentheses.

      16 replies →

    • I don’t think that the parent comment is saying all of the bugs would have been prevented by using Rust.

      But in the listed categories, I’m equally skeptical that none of them would have benefited from Rust even a bit.

      1 reply →

  • Why doesn't it surprise me that the CAN bus driver bugs have the longest average lifetime?

  • > Furthermore, the 19-year-old bug case study is a refcounting error

    It always surprised me how the top-of-the line analyzers, whether commercial or OSS, never really implemented C-style reference count checking. Maybe someone out there has written something that works well, but I haven’t seen it.

This is I think an under-appreciated aspect, both for detractors and boosters. I take a lot more “risks” with Rust, in terms of not thinking deeply about “normal” memory safety and prioritizing structuring my code to make the logic more obviously correct. In C++, modeling things so that the memory safety is super-straightforward is paramount - you’ll almost never see me store a std::string_view anywhere for example. In Rust I just put &str wherever I please, if I make a mistake I’ll know when I compile.

> It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions) wouldn't necessarily be caught by the borrow checker. Rust is fantastic for memory safety, but it will not stop you from misunderstanding the spec of a network card or writing a race condition in unsafe logic that interacts with DMA.

Rust is not just about memory safety. It also have algebraic data types, RAII, among other things, which will greatly help in catching this kind of silly logic bugs.

The concurrent state machine example looks like a locking error? If the assumption is that it shouldn't change in the meantime, doesn't it mean the lock should continue to be held? In that case rust locks can help, because they can embed the data, which means you can't even touch it if it's not held.

Rust has more features than just the borrow checker. For example, it has a a more featured type system than C or C++, which a good developer can use to detect some logic mistakes at compile time. This doesn't eliminate bugs, but it can catch some very early.

  • [dead]

    • > But unsafe Rust, which is generally more often used in low-level code, is more difficult than C and C++.

      I think "is" is a bit too strong. "Can be", sure, but I'm rather skeptical that all uses of unsafe Rust will be more difficult than writing equivalent C/C++ code.

      1 reply →

It’s hilarious that you feel the need to preemptively take control of the narrative in anticipation of the Rust people that you fear so much.

Is this an irrational fear, I wonder? Reminds me of methods used in the political discourse.

  • People who make that kind of remarks should be called out and shunned. The Rust community is tired of discrimination and being the butt of jokes. All the other inferior languages prey on its minority status, despite Rust being able to solve all their problems. I take offense to these remarks, I don't want my kids to grow up as Rustaceans in such a caustic society.

  • > It’s hilarious that you feel the need to preemptively take control of the narrative in anticipation of the Rust people that you fear so much.

    > Is this an irrational fear, I wonder? Reminds me of methods used in the political discourse.

    In a sad sort of way, I think its hilarious that hn users have been so completely conditioned to expect rust evangelism any time a topic like this comes up that they wanted to get ahead of it.

    Not sure who it says more about, but it sure does say a whole lot.

    • I don’t think evangelism is necessary anymore. Rust adoption is now a matter of time.

I’ve seen too many embedded drivers written by well known companies not use spinlocks for data shared with an ISR.

At one point, I found serious bugs (crashing our product) that had existed for over 15 years. (And that was 10 years ago).

Rust may not be perfect but it gives me hope that some classes of stupidity will be either be avoided or made visible (like every function being unsafe because the author was a complete idiot).

> race condition in unsafe logic that interacts with DMA

It's worth noting that if you write memory safe code but mis-program a DMA transfer, or trigger a bug in a PCIe device, it's possible for the hardware to give you memory-safety problems by splatting invalid data over a region that's supposed to contain something else.

I don't think 70% of bugs are memory safety issues.

In my experience it's closer to 5%.

Eh... Removing concurrence bugs is one of the main selling points for Rust. And algebraic types are a really boost for situations where you have lots of assumptions.

No other top-level comments have since mentioned Rust[1] and TFA mentions neither Rust nor topics like memory safety. It’s just plain bugs.

The Rust phantom zealotry is unfortunately real.

[1] Aha, but the chilling effect of dismissing RIR comments before they are even posted...

  • Yes, I saw this last night and was confused because only one comment mentioned Rust, and it was deleted I think. I nearly replied "you're about to prompt 1,000 rust replies with this" and here's what I woke up to lol

Rust has other features that help prevent logic errors. It's not just C plus a borrow checker.

Rust would prevent a number of bugs, as it can model state machine guarantees as well.

Rewriting it all in Rust is extremely expensive, so it won't be done (soon).

  • Expensive because of: 1/ a re-write is never easy 2/ rust is specifically tough (because it catches error and forces you to think about it for real, because it makes some contruct (linked list) really hard to implement) for kernel/close to kernel code ?

    • Both I'd say. Rust imposes more constraints on the structure of code than most languages. The borrow checker really likes ownership trees whereas most languages allow any ownership graph no matter how spaghetti it is.

      As far as I know that's why Microsoft rewrote Typescript in Go instead of Rust.

      2 replies →

Thanks for raising this. It feels like evangelists paint a picture of Rust basically being magic which squashes all bugs. My personal experience is rather different. When I gave Rust a whirl a few years ago, I happened to play with mio for some reason I can't remember yet. Had some basic PoC code which didn't work as expected. So while not being a Rust expert, I am still too much fan of the scratch your own itch philosophy, so I started to read the mio source code. And after 5 minutes, I found the logic bug. Submitted a PR and moved on. But what stayed with me was this insight that if someone like me can casually find and fix a Rust library bug, propaganda is probably doing more work then expected. The Rust craze feels a bit like Java. Just because a language baby-sits the developer doesn't automatically mean better quality. At the end of the day, the dev needs to juggle the development process. Sure, tools are useful, but overstating safety is likely a route better avoided.