← Back to context

Comment by wahern

14 days ago

> What’s notable is that all of these bugs landed in a production Rust codebase, written by people who knew what they were doing

They knew how to write Rust, but clearly weren't sufficiently experienced with Unix APIs, semantics, and pitfalls. Most of those mistakes are exceedingly amateur from the perspective of long-time GNU coreutils (or BSD or Solaris base) developers, issues that were identified and largely hashed out decades ago, notwithstanding the continued long tail of fixes--mostly just a trickle these days--to the old codebases.

Reading that Canonical thread was jaw-dropping. Paraphrased: "Rust is more secure, security is our priority, therefore deploying this full-rewrite of core utils is an emergency. If things break that's fine, we'll fix it :)".

I would not want to run any code on my machines made by people who think like this. And I'm pro-Rust. Rust is only "more secure" all else being equal. But all else is not equal.

A rewrite necessarily has orders of magnitude more bugs and vulnerabilities than a decades-old well-maintained codebase, so the security argument was only valid for a long-term transition, not a rushed one. And the people downplaying user impact post-rollout, arguing that "this is how we'll surface bugs", and "the old coreutils didn't have proper test cases anyway" are so irresponsible. Users are not lab rats. Maintainers have a moral responsibility to not harm users' systems' reliability (I know that's a minority opinion these days). Their reasoning was flawed, and their values were wrong.

  • This leaves such a bad taste in my mouth. If you fucking found 44 CVEs with some relatively amateurish ones (I'm no security engineer but even I've done that exact TOCTOU mitigation before) in such a core component of your system a month before 26.04 LTS release (or a couple months if you count from their round 1), surely the response should be "we need to delay this to 28.04 LTS to give it time to mature", not "we'll ship this thing in LTS anyway but leave out the most obviously problematic parts"?

    The snap BS wasn't enough to move me since I was largely unaffected once stripping it out, but this might finally convince me to ditch.

    • It's insane that this is going into an LTS. It's the kind of experiment I'd expect them to play with in a non-LTS and revert in LTSes until it's fully usable, like they did with Wayland being the default, which started in 2017

  • This is a people problem and Canonical just isn't good at hiring people

    • I’ve gotta agree. Some horror stories were going around about their interview process. It seemed highly optimized to select people willing to put up with insane top-down BS.

  • Agree with the point. Asking sincerely, how to filter out installing any rust-rewrite packages on my machines? Does anyone know the way?

More than that: it seems that Rust stdlib nudges the developer towards using neat APIs at an incorrect level of abstraction, like path-based instead of handle-based file operations. I hope I'm wrong.

  • Nearly every available filesystem API in Rust's stdlib maps one-to-one with a Unix syscall (see Rust's std::fs module [0] for reference -- for example, the `File` struct is just a wrapper around a file descriptor, and its associated methods are essentially just the syscalls you can perform on file descriptors). The only exceptions are a few helper functions like `read_to_string` or `create_dir_all` that perform slightly higher-level operations.

    And, yeah, the Unix syscalls are very prone to mistakes like this. For example, Unix's `rename` syscall takes two paths as arguments; you can't rename a file by handle; and so Rust has a `rename` function that takes two paths rather than an associated function on a `File`. Rust exposes path-based APIs where Unix exposes path-based APIs, and file-handle-based APIs where Unix exposes file-handle-based APIs.

    So I agree that Rust's stdilb is somewhat mistake prone; not so much because it's being opinionated and "nudg[ing] the developer towards using neat APIs", but because it's so low-level that it's not offering much "safety" in filesystem access over raw syscalls beyond ensuring that you didn't write a buffer overflow.

    [0]: https://doc.rust-lang.org/std/fs/index.html

    • > So I agree that Rust's stdilb is somewhat mistake prone; not so much because it's being opinionated and "nudg[ing] the developer towards using neat APIs", but because it's so low-level that it's not offering much "safety" in filesystem access over raw syscalls beyond ensuring that you didn't write a buffer overflow.

      `openat()` and the other `*at()` syscalls are also raw syscalls, which Rust's stdlib chose not to expose. While I can understand that this may not be straight forward for a cross-platform API, I have to disagree with your statement that Rust's stdlib is mistake prone because it's so low-level. It's more mistake prone than POSIX (in some aspects) because it is missing a whole family of low-level syscalls.

      16 replies →

    • > For example, Unix's `rename` syscall takes two paths as arguments; you can't rename a file by handle

      And then there’s renameat(2) which takes two dirfd… and two paths from there, which mostly has all the same issues rename(2) does (and does not even take flags so even O_NOFOLLOW is not available).

      I’m not sure what you’d need to make a safe renameat(), maybe a triplet of (dirfd, filefd, name[1]) from the source, (dirfd, name) from the target, and some sort of flag to indicate whether it is allowed to create, overwrite, or both.

      As the recent https://blog.sebastianwick.net/posts/how-hard-is-it-to-open-... talks about (just for file but it applies to everything) secure file system interaction is absolutely heinous.

      [1]: not path

      2 replies →

  • After reading this article, I'm inclined to think that the right thing for this project to do is write their own library that wraps the Rust stdlib with a file-handle-based API along with one method to get a file handle from a Path; rewrite the code to use that library rather than rust stdlib methods, and then add a lint check that guards against any use of the Rust standard library file methods anywhere outside of that wrapper.

    • If that's the right approach, then it would be useful to make that library public as a crate, because writing such hardened code is generally useful. Possibly as a step before inclusion in the rust stdlib itself.

  • Unfortunately, it's not the Rust stdlib, it's nearly every stdlib, if not every one. I remember being disappointed when Go came out that it didn't base the os module on openat and friends, and that was how many years ago now? I wasn't really surprised, the *at functions aren't what people expect and probably people would have been screaming about "how weird" the file APIs were in this hypothetical Go continually up to this very day... but it's still the right thing to do. Almost every language makes it very hard to do the right thing with the wrong this so readily available.

    I'm hedging on the "almost" only because there are so many languages made by so many developers and if you're building a language in the 2020s it is probably because you've got some sort of strong opinion, so maybe there's one out there that defaults to *at-style file handling in the standard library because some language developer has the strong opinions about this I do. But I don't know of one.

    • Openat appeared in Linux in 2006 but not in FreeBSD until 2009; go started being developed in 2007. It probably missed the opportunity by a year. It would have been the right thing to change the os module at some point in the last 18 years, however.

      1 reply →

  • If anything, I find the rust standard library to default to Unix too much for a generic programming language. You need to think very Unixy if you want to program Rust on Windows, unless you're directly importing the Windows crate and foregoing the Rust standard library. If you're writing COBOL style mainframe programs, things become even more forced, though I doubt the overlap between Rust programmers and mainframe programmers that don't use a Unix-like is vanishingly small.

    This can also be a pain on microcontrollers sometimes, but there you're free to pretend you're on Unix if you want to.

    • If you want to support file I/O in the standard library, you have to choose _some_ API, and that either is limited to the features common to all platforms, or it covers all features, but call that cannot be supported return errors, or you pick a preferred platform and require all other platforms to try as hard as they can to mimic that.

      Almost all languages/standard libraries pick the latter, and many choose UNIX or Linux as the preferred platform, even though its file system API has flaws we’ve known about for decades (example: using file paths too often) or made decisions back in 1970 we probably wouldn’t make today (examples: making file names sequences of bytes; not having a way to encode file types and, because of that, using heuristics to figure out file types. See https://man7.org/linux/man-pages/man1/file.1.html)

      16 replies →

    • That's the same for the C or Python standard libraries. The difference is that in C you tend to use the Win32 functions more because they're easily reached for; but Python and Rust are both just as Unixy.

      1 reply →

> They knew how to write Rust, but clearly weren't sufficiently experienced with Unix APIs, semantics, and pitfalls.

The point of Rust is that you shouldn't have to worry about the biggest, easiest to fall in pitfalls.

I think the author's point of this article, is that a proper file system API should do the same.

Having panics in these are pretty amateur hour even just on a Rust level. I could see if they were like alloc errors which you can't handle, but expect and unwraps are inexcusable unless you are very carefully guarding them with invariants that prevent that code path from ever running.

Someone once coined a related term, "disassembler rage". It's the idea that every mistake looks amateur when examined closely enough. Comes from people sitting in a disassembler and raging the high level programmers who had the gall to e.g. use conditionals instead of a switch statement inside a function call a hundred frames deep.

We're looking solely at the few things they got wrong, and not the thousands of correct lines around them.

  • Thing is, these tools are so critical that even one error may cause systems to be compromised; rewriting them should never be taken lightly.

    (Actually ideally there's formal verification tools that can accurately test for all of the issues found in this review / audit, like the very timing specific path changes, but that's a codebase on its own)

    • Is formal verification able to find most of these issues? I'm no expert on formal analysis, but I suspect most systems are not able to handle many of these errors. It seems more likely that the system will assume the file doesn't change between two syscalls - which seems to be the majority of issues. Modeling that possibility at least makes the formal system much harder to make.

  • When I read the article I came away with the impression that shipping bugs this severe in a rewrite of utils used by hundreds of millions of people daily (hourly?) isn’t ok. I don’t think brushing the bad parts off with “most of the code was really good!” is a fair way to look at this.

    Cloudflare crashed a chunk of the internet with a rust app a month or so ago, deploying a bad config file iirc.

    Rust isn’t a panacea, it’s a programming language. It’s ok that it’s flawed, all languages are.

    • I think that legitimate real world issues in rust code should be talked about more often. Right now the language enjoys a reputation that is essentiaöly misleading marketing. It isn't possible to create a programing language that doesn't allow bugs to happen (even with formal verification you can still prove correctness based on a wrong set of assumptions). This weird, kind of religious belief that rust leads to magically completely bug free programs needs to be countered and brought in touch with reality IMO.

      19 replies →

    • I find it hilarious that this comment is being downvoted.

      Exactly what is the controversial take here?

      > I don’t think brushing the bad parts off with “most of the code was really good!” is a fair way to look at this.

      Nope. this is fine.

      > Cloudflare crashed a chunk of the internet with a rust app a month or so ago, deploying a bad config file iirc.

      Maybe this?

      > Rust isn’t a panacea, it’s a programming language. It’s ok that it’s flawed, all languages are.

      Nope, this is fine too.

      17 replies →

Memory safety catches buffer overflows. CI catches logic bugs. Neither catches the Unix API gotchas nobody documented.

  • They're not API gotchas in most cases.

    And writing comprehensive tests for this behaviour is very difficult regardless of which language you are using.

    I am all for rust rewrites of things. But in this case, these are mistakes which were encouraged by the lazy design of `std::fs` and the developers' lack of relevant experience.

    And to clarify, I don't blame the developers for lacking the relevant experience. Working on such a project is precisely the right place to learn stuff like this.

    I think it's an absurdly dumb move by Canonical to take this project and beta-test it on normal users' machines though…

  • How does CI catch logic bugs?

    • That depends on what tests you are running. In any significant projects you need a test suite so large that you wouldn't run all the tests before pushing to CI - instead you are the targeted tests that test the area of code you changed, but there are more "integration tests" that go through you code and thus could break, but you don't actually run.

      You can also run some static analysis that is too long to run locally every time, but once in a while it will point out "this code pattern is legal buy is almost always a bug"

      It is also possible to do some formal analysis of code on CI that you wouldn't always run locally - I'm not an expert on these.

      1 reply →

Seems pretty impressive they rewrote the coreutils in a new language, with so little Unix experience, and managed to do such a good job with very little bugs or vulns. I would have expected an order of magnitude more at least.

Shows how good Rust is, that even inexperienced Unix devs can write stuff like this and make almost no mistakes.

  • Yes, it's the lack of Unix experience that's terrifying. So many of mistakes listed are rookie mistakes, like not propagating the most severe errors, or the `kill -1` thing. Why were people who apparently did not have much experience using coreutils assigned to rewrite coreutils?

  • Rewriting perfectly good code was a colossal mistake.

    • Not necessarily, but was the reasoning sound and have the tradeoffs been made? The website (https://uutils.github.io/) shows some reasonable "why"s (although I disagree with making "Rust is more appealing" a compelling reason, but that's just me (disclaimer: I don't like C and don't know Rust so take this comment as you will)), but I think what's missing is how they will ensure both compatibility and security / edge case handling, which requires deep knowledge and experience in the original code and "tribal knowledge" of deep *nix internals.

    • I do wonder whether people got down the article enough to see the list of bugs patched in GNU coreutils.

      That "perfectly good code" that it sounds like no one should question included "split --line-bytes has a user controlled heap buffer overflow".

      1 reply →

    • The irony here being that GNU's coreutils themselves originated as rewrites, from back when BSD's copyright status was still legally unclear.

      1 reply →