Making a RISC-V Operating System Using Rust (2019)

3 years ago (osblog.stephenmarz.com)

What excites me most about RISC-V is the ability to potentially experiment at a systems level. I love the Mill project and they have a ton of interesting and clever ideas but they're trying to innovate along so many dimensions at once that I feel like it's almost a forgone conclusion that they'll fail. What I'd really like is for someone to design a chip with their ideas about backless memory and putting address translation at cache/memory border and do a RISC-V implementation of that. Then you could try getting a working OS with that up, and then maybe incrementally try out other ideas rather than just trying to change the world all at once.

We've had this paradigm about how memory and permission and OS resource management works that we've have since the PDP-11 and I think we ought to be looking at how to move beyond that.

http://millcomputing.com/wiki/Memory

Random RISC-V question: how do you make function calls in RV64?

I have read that 'call' in assembly is a pseudo-instruction, which uses auipc and jalr to build the destination address.

But surely that can only supply 32 bits of address? Is there a different sequence on 64 bit RISC-V? Or have I fundamentally misunderstood?

  • Direct function calls are usually done relative to the program counter (pc in auipc). So you get 32 bits of displacement, which is fine as long as your code is less than 4GB, and you get position-independent code for free. This is the same as on x86-64 and aarch64 too, which also have ~32 bits for call target immediates. (also, compilers have "code model" options for choosing the size of code to support)

    • I assume there's also support for a kind of calculated long jump done at elf load time? I know that's done as part of the relocations on aarch64.

  • My understanding is that AUIPC builds the destination address by adding a 32 bit offset to the current PC, i.e. it is relative, and that this works with both 32 and 64 bit addresses. The difference is that, on 32-bit systems this allows to immediately access any address while on 64-bit systems you are limited to this relative 32-bit window.

  • Either: "jal lr, dest" or "jalr lr, offset(reg)"

    Both save the address of the next instruction in lr, the first is a relative branch the second is to the address pointed by a register plus an offset

    Use "jalr r0, 0(lr)" to return

    Anything else is just syntactic sugar in the assembler to set up the reg for jalr

This is great. It is the right way forward. Trying to smush Rust into old operating systems is in my opinion (and I might be the only one with the opinion) not an optimal use of resources.

We need to build operating systems from the ground up in languages like Rust. We also need to let go of the UNIX way of doing things and move beyond it. It cannot possibly be the best there can ever be.

Build them for modern hardware and even more so future hardware we can see coming down the line. Build them with security in mind from the ground up. Things like ransomware could be made close to impossible by an operating system.

What will that be? I am not sure.

I am hoping for a lot of people to try a lot of new things and allow things to evolve.

  • We also need to let go of the UNIX way of doing things and move beyond it.

    So...dump all existing unix/linux software in the bin and start writing things like browsers and desktop managers and office suites from scratch. Have fun...see you in 20-30 years.

    • Sounds like the entire Midori project only ran for 15, and iiuc it got all of that on top of a memory-safe kernel that was reinventing the language it was written in as the project was going.

      3 replies →

  • A greenfield OS with real aspirations is a moonshot no matter what it's written in, so if we want to see systems-level investment in Rust that yields practical benefits...

Just a random thought: but wasn't C written specifically at first for the PDP series of mini computers? On the other side, the language evolved to write Unix.

It feels we don't really have the same kind of match in modern programming languages. I wonder what a modern equivalent of C would look like that was tailor fitted to RISC-V ISA and an OS that was targeted to that device?

It just is the case that one of the most successful programming languages of all time, C, was shaped by the existing hardware and the needs of the OS the language designers were building. But now we are going in the other direction by starting with an existing programming language and presumably shaping the OS to its strengths.

  • C (and I use the term loosely, not in de strict definition of standard C) is very well suited for low level programming of RISC processors. It not clear that a new language would be a big benefit. The benefit of C is to abstract from assembler and make Unix portable between architectures. C still works very well for that today.

    The big question is whether we still want to write new production code in C. For one, the C standard is horrible. Just about everything is undefined behavior. The second is that you don't want to spend a lot of time checking code for array bound errors, use after free, other problems with pointers, variables or data structures that are not (properly) initialized, etc.

    In addition, for large projects, all of the manual namespace management is also a bit of a bore.

    One big issue with C is that more or less, the CPU architecture has to treat memory as an array of bytes. There is just too much code that assumes that this is the case. Every data pointer can be converted to a void *, all function pointers are alike, etc. With Rust, there is a chance that radically new architectures are possible, if compatibility with C is no longer required.

    • > C ... is very well suited for low level programming of RISC processors.

      I wouldn't dispute that. You actually caught me expressing a raw and incomplete idea.

      I'm imagining a totally fictitious set of events. Kernighan and Ritchie sitting in front of a PDP-11 and thinking to themselves: we'd really like a way to get this hardware to support multiple users simultaneously. I mean, the impetus of the language was this transitive dependency from hardware -> user need -> OS -> language to build that OS.

      So I am looking at these new RISC-V SoCs that are coming out. I'm thinking about things like tenstorrent and their chiplet designs where you have a RISC-V CPU core surrounded by an ungodly amount of GPU-like tensor cores. Nvidia's Grace Hopper chip designs are kinda-sorta similar but with ARM cores.

      We have a brand new kind of system architecture on our hands. And we have a new kind of challenge, which is the need to efficiently use that hardware. Instead of multi-user sharing of a single mini-computer, we have to find a way to saturate the GPU-like tensor cores surrounding a RISC CPU.

      So we have a new chain with "new hardware" -> "new user need". So, maybe we do need a "new OS" that isn't POSIX-ish. And if that chain follows, then I wonder what "new language" would look like to complete that chain. Because I doubt that C or Rust are probably the right ones.

      It seems we are grabbing tools that happen to be handy and applying them to a new situation without doing the same kind of first-principle, ground-up thinking that appears to have been the kind of thinking that lead to C. In other words, we're asking "what kind of OS can I build with this language?" instead of "what kind of language do I need to build this OS?"

      2 replies →

    • > the C standard is horrible. Just about everything is undefined behavior

      How is this horrible? This is the core defining feature of C. It is equivalent of saying that a phallic shaped object with a sharp edge is terrible because you may cut/stab yourself and bleed to death, therefore dull equivalent is better (always). This is known as a knife. Knives with extremely sharp blades exist because grinding at the thing you're trying to cut for hours isn't something all of us enjoy. Have you tried not hurting yourself instead before calling it terrible? Sure, one idiot may die because of it, some unlucky guy may get killed with one, but really, I prefer using a real knife when cooking my food and I will never change my mind about that until I die.

      > In addition, for large projects, all of the manual namespace management is also a bit of a bore.

      When you're writing in C, fun is the least of your worries. I can recommend C++ for big projects, noone but yourself forces use of templates and classes in that department. I can't speak for people writing for controllers made in 1984 with proprietary compilers that probably aren't even maintained anymore, but that's not my problem. For projects like Linux, C++ would be more than sufficient. It's ironic that Rust has been accepted to Linux kernel because what he said about C++ over a decade ago couldn't be more true for Rust aswell, even though all you need to fix that is a bit of discipline, ironic given C programming is practically 99.97% of that.

      > One big issue with C is that more or less, the CPU architecture has to treat memory as an array of bytes.

      Not relevant, virtual memory is used in practically all modern hardware and "virtual" here means that you have arrays and have no clue what it's truly like already in hardware. Even kernels run in this mode, and all kernel code is written with this expectation. Before you make up such bullshit nonsense, you should tell us about this memory model that's superior to dumb arrays, because until then, noone has ever thought of a better one.

      1 reply →

  • It wouldn't look like anything because "modern" means bloated garbage that does less in more time. You guessed it, I'm talking about Rust, and when you're writing a kernel out of all things, Rust has infinite disadvantages and 0 advantages over C.

    I'm sure many people have a lot to say to counter this fact, but really, all I need to do is look at https://www.cvedetails.com/vulnerability-list/vendor_id-1902..., whoever begins writing a Rust kernel will reintroduce at least half of these and the other half will be bugs you never even imagined because of false sense of security and lower barrier to entry into a field that's pretty bare, and very very cruel.

    If you really believe that Rust fixes having to remember 300 different hardware types that you support and somehow will check that your code is safe for all of them at compile time, you're gravely mistaken.

    • > You guessed it, I'm talking about Rust, and when you're writing a kernel out of all things, Rust has infinite disadvantages and 0 advantages over C.

      I don't think that's a defensible claim, nor that your evidence is compelling. Rust has some disadvantages vs C, certainly - I'd start with number of target hardware platforms, personally - but I doubt that they're meaningfully infinite. Likewise, rust certainly has more than 0 advantages over C; just making the most common memory errors harder to hit would be fairly compelling on its own (personally I like Pascal and Ada for that, but apparently Rust hit the popularity jackpot so oh well). As to rust having CVEs... sure, it's not perfect, but I have to notice that that page shows no more than a handful of vulnerabilities per year, while, say, Linux has had far more vulnerabilities, many of them caused by the exact memory errors that rust tries to fight.

      There are legitimate arguments against rust (honestly, your initial claim of bloat sounds plausible), but unless I've missed something this isn't the one I would pick.

This blog post is from 2019 (with updates from 2020). I guess Rust has matured since then (and especially the RISCV support in Rust). Does anyone have more insight into this?

  • Since this is a from-scratch implementation, there's not much support from the language side to begin with. Except for some build system details that may have been made less annoying, I don't think much has changed.

    • Ehhhh yes and no, you're mostly correct. 2021 is currently the shelf year for the latest edition of Rust.

      However rust versions everything, so even if you have a new compiler that removed support for something, you can still pull the old toolchains and build legacy codebases (like this one).

      My Rust OS not only pins the edition (which is the default anyway with `cargo init`) but also uses a toolchain file to pin the date of the nightly build I want contributors to build with.

      Rust tooling is really fantastic in this regard.

      2 replies →

This is an alright tutorial, and Rust is still pretty stable, but the QEMU implementation of RISC-V supervisor mode has changed a little bit and you'll need to fiddle with PMP and the MMU a little more.

I also found that there's a fair amount of code Stephen never introduces or talks about on the videos, and just has dropped in the repo. This can make cobbling together your own version of this a little tougher, but maybe a fun experience.

Relevant, also an OS written in and made possible by Rust: https://www.theseus-os.com/

Rust’s compile-time guarantees make an entirely different, single address space, single privilege level OS architecture possible. There is no kernel/userspace distinction. I hope it succeeds.