Two Paths to Memory Safety: CHERI and OMA

4 months ago (ednutting.com)

Doubtless Computing is apparently the blogger's new startup.

He had previously co-founded the CPU startup VyperCore, which had been based around technology in his PhD thesis at the University of Bristol. The startup folded earlier this year.[1]

VyperCore's main selling point was that having garbage collection in hardware could run managed languages faster, and with less energy. Apparently they came as far as running code in FPGA. Except for the object-addressing hardware, it was based on a RISC-V core.

I wonder which assets (patents and other) that the new company has been able to get from the old.

Both companies were/are apparently targeting the data centre first... which I think is a bit bold, TBH. I follow RISC-V, and have seen maybe a dozen announcements of designs for wide-issue cores that on paper could have been competitive with AMD and Intel ... if only they would have got to be manufactured in a competitive process. But that is something that would require significant investment.

1. https://ednutting.substack.com/p/vypercore-failed

  • A pretty reasonable summary of things from an outside perspective - have my thumbs up ;)

    (And a very good question, to be answered at a later stage.)

    • Almost all of these projects fail for marketing reasons. They want more performance, cheap stuff, or legacy compatibility. They'll say they'll buy secure chips or OS's until some tradeoff is required with a desired application. Then, they cancel it and the supplier is left with a huge loss.

      I hope you succeed. I also thank you for a detailed write-up that listed good offerings from your competitors. That's more honest and fair than most startup writing. ;)

      With compute-oriented hardware, have you considered making a prepackaged, multicore version that runs on Amazon F1 FPGA's? Then, anyone could test or use it in the cloud.

      That would be too expensive for basic, web/app servers to use. However, some companies might use it for database, key servers, or (defense market) high-speed guards which already cost a fortune.

      With FPGA's, one might also make load balancers with firewalls and SSL acceleration because they'd be comparing the price to SSL accelerators. Also, gateways for AI interactions which are in high demand right now.

      Just some ideas for you to toy with.

The MMU leads to horribly leaky operating system abstractions. IME it’s leaky due to the lack of separation between address space remapping(preventing fragmentation) and memory protection(security).

Perhaps unintentionally, RISC-V provides more flexibility to kernel developers by also including a physical memory protection unit that can run underneath and simultaneously with the MMU. This can make it far cheaper to switch memory protection on and off for arbitrarily sized areas of physical memory since this capability is no longer needlessly coupled to expensive memory remapping operations. Kernels can move away from the typical “process is its own virtual computer” model, and towards things closer to single address space designs or some middle ground between the two.

  • Thinking of bunny huang’s open source hardware efforts, a 28nm nommu system seems like a good long term open source target. How much complexity in the system is the mmu, and so how much complexity could we cut out while still having the ability to run third-party untrusted code?

  • 1 Do you have benchmarks for the RISC-V "physical memory protection unit" and/or where can I read more? I'm looking ideally for things like type 1/2 hypervisor or Kernel tutorials for RISC-V to exemplify technical trade-offs. 2 Separation of virtual memory<->security sounds reasonable to offload security eventually to simpler and verified (and ideally eventually synthesized) hypervisors instead of complex Kernels, but I am wondering about capability and debugging limits. 3 The last sentence is very speculative and I dont get how that could be reached.

    • You can find more in the RISC-V privileged specification[1], section 3.7 I don't have any benchmarks and I think no such generalized benchmarks exists since its a specification and every core brings its own implementation (or none, its optional). With that said, its simple and probably effectively zero overhead, but its also much less capable than what a MMU can do. Its a "protect some firmware against the OS" or "absolute minimum hardware for some memory protection in a cheap MCU", not a competitor to full fat virtual memory.

      [1]: https://docs.riscv.org/reference/isa/_attachments/riscv-priv...

  • What are examples of the MMU leading to poor abstractions? I agree it's not ideal, and encourages some poor abstractions, but I think it's moreso the case that mainstream operating systems have failed to innovate sufficiently. For example, the notion of a 1:1 bijection of address space and process doesn't have to be the case, but it is, I suppose, the most obvious use. (There is vfork(...), heh.)

  • > lack of separation between address space remapping(preventing fragmentation) and memory protection(security

    Maybe segments weren't such a bad thing.

Three paths, SPARC Application Data Integrity (ADI)

https://docs.oracle.com/en/operating-systems/solaris/oracle-...

Although I do conceed, most folks aren't keen into picking up anything related to Oracle or Solaris nowadays.

  • I haven't come across this specific feature before. From reading about it, it seems closely related to Arm (E)MTE ISA extensions - Memory Tagging Extension?

    What's interesting is that approach (software-defined 'random' numbers to associate memory regions and valid pointers) provides only probabilistic memory safety. A malicious actor may find a way to spoof/guess the tag needed to access a particular piece of memory. Given Arm MTE has been breached in the last year, it's hard to argue that it's a good enough security guarantee. EMTE may fix issues (e.g. side-channels) but leaves open the probabilistic pathway (i.e. "guess the tag") which is a hole MTE isn't designed to try to close (so, a software breach on top of a chip with EMTE can't necessarily be argued to be a violation of the hardware's security properties, though it may exploit the architectural security hole).

    In contrast, CHERI and OMA (Object Memory Architecture) are both providing hardware-enforced guarantees of memory safety properties - unbreakable even if the attacker has perfect knowledge - backed up by formal proofs of these claims.

    CHERI offers referential and spatial safety as hardware guarantees, with temporal being achievable in software. OMA offers referential, spatial and temporal safety as hardware guarantees.

    • Kind of, with the difference that it has been in production since 2015 on Solaris SPARC systems, granted they aren't as widespread as they once were.

      Sometimes the perfect is enemy from good, none of the memory tagging solutions has achieved mainstream widespread adoption outside iDevices.

      Google apparently doesn't want to anger Android OEMs demanding it to be required by Android, thus it remains a Pixel only feature.

      CHERI and OMA are going to still take years for mainstream adoption if ever comes to it.

      I had hopes for whatever Microsoft was doing in CHERIoT to eventually come to Windows in some fashion, but best it has happened seems to be the adoption of Pluton in CoPilot+ PC, which anyway serves a different purpose.

    • Can you please provide sources about Arm EMTE being breached? I couldn’t find any information online.

Could we also consider just not connecting critical systems to the internet at large? No reason, for example, for the Jaguar assembly line to depend on an internet connection.

" Rather than extending paged memory, OMA implements object-based memory management directly in hardware. Every allocation becomes a first-class hardware object with its own identity, bounds, and metadata maintained by the processor itself."

what is this supposed to mean? like a whole new isa + kernel + userland?

  • My interpretation, going off on the linked integrated GC research: extensions to the ISA and thus compiler backend, no modifications to 'well formed' applications, some changes to the language runtime dealing with memory management.

    Unless the CPU hardware becomes some kind of hybrid monster with both OMA and traditional paged MMU, you will need to make changes to the kernel. You may be able to emulate some of the kernel's page table shenanigans with the object-based system, but I don't think that the kernel's C code is typically 'well-formed'. It's probably a lot of engineering effort to make the necessary kernel changes, but so are all those complex kernel hardening efforts that hardware-level memory security like OMA would render moot.

Unlikely that new HW will be the solution.

You can have a memory safe Linux userland today in stock hardware. https://fil-c.org/pizlix

  • Fil-C is basically CHERI in software, with large speed and memory overhead.

    • Fil-C running on any x86 box is faster than any CHERI implementation that has ever existed.

      That's likely to be true in embedded also, just because of the relationship between volume and performance in silicon. Fil-C runs on the high volume stuff, so it'll get better perf.

      CHERI doubles the size of pointers, so it's not like it has a decisive advantage over Fil-C.

      9 replies →

    • But seemingly on track to move from "large" to "significant" speed and memory overhead. It is already really useful especially for running tests in pipelines.

    • > Fil-C is basically CHERI in software

      It's not, actually.

      Fil-C is more compatible with C/C++ than CHERI, because Fil-C doesn't change `sizeof(void*)`.

      Fil-C is more compatible in the sense that I can get CPython to work in Fil-C and to my knowledge it doesn't work on CHERI.

      Fil-C also has an actual story for use-after-free. CHERI's story is super weak

      4 replies →