Comment by tliltocatl
3 days ago
> Show me somebody who calls the IBM S/360 a RISC design, and I will show you somebody who works with the s390 instruction set today.
Ahaha so true.
But to answer the post's main question:
> Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?
Because backwards compatibility is more valuable than elegant designs. Because array-crunching performance is more important than safety. Because a fix for a V8 vulnerability can be quickly deployed while a hardware vulnerability fix cannot. Because you can express any object model on top of flat memory, but expressing one object model (or flat memory) in terms of another object model usually costs a lot. Because nobody ever agreed of what the object model should be. But most importantly: because "memory safety" is not worth the costs.
But we don't have a linear address space, unless you're working with a tiny MCU. For last like 30 years we have virtual address space on every mainstream processor, and we can mix and match pages the way we want, insulate processes from one another, add sentinel pages at the ends of large structures to generate a fault, etc. We just structure process heaps as linear memory, but this is not a hard requirement, even on current hardware.
What we lack is the granularity that something like iAPX432 envisioned. Maybe some hardware breakthrough would allow for such granularity cheaply enough (like it allowed for signed pointers, for instance), so that smart compilers and OSes would offer even more protection without the expense of switching to kernel mode too often. I wonder what research exists in this field.
This feels like a pointless form of pendantry.
Okay, so we went from linear address spaces to partioned/disaggregated linear address spaces. This is hardly the victory you claim it is, because page sizes are increasing and thus the minimum addressable block of memory keeps increasing. Within a page everything is linear as usual.
The reason why linear address spaces are everywhere has to do with the fact that they are extremely cost effective and fast to implement in hardware. You can do prefix matching to check if an address is pointing at a specific hardware device and you can use multiplexers to address memory. Addresses can easily be encoded inside a single std_ulogic_vector. It's also possible to implement a Network-on-Chip architecture for your on-chip interconnect. It also makes caching easier, since you can translate the address into a cache entry.
When you add a scan chain to your flip flops, you're implicitly ordering your flip flops and thereby building an implicit linear address space.
There is also the fact that databases with auto incrementing integers as their primary keys use a logical linear address space, so the most obvious way to obtain a non-linear address space would require you to use randomly generated IDs instead. It seems like a huge amount of effort would have to be spent to get away from the idea of linear address spaces.
We have a linear address space where we can map physical RAM and memory-mapped devices dynamically. Every core, at any given time, may have its own view of it. The current approach uses pretty coarse granularity, separating execution at the process level. The separation could be more granular.
The problem is the granularity of trust within the system. Were the MMU much faster, and TLB much larger (say, 128MiB of dedicated SRAM), the granularity might be pretty high, giving each function's stack a separate address space insulated from the rest of RAM. This is possible even now, just would be impractically slow.
Any hierarchical (tree-based) addressing scheme is equivalent to a linear addressing scheme, pick any tree traversal algorithm. Any locally-hierarchical addressing scheme seemingly can be implemented with (short) offsets in a linear address space; this is how most jumps in x64 and aarch64 are encoded, for instance.
1 reply →
its entirely possible to implement segments on top of paging. what you need to do is add the kernel abstractions for implementing call gates that change segment visibility, and write some infrastructure to manage unions-of-a-bunch-of-little-regions. I haven't implemented this myself, but a friend did on a project we were working on together and as a mechanism it works perfectly well.
getting userspace to do the right thing without upending everything is what killed that project
There is also a problem of nested virtualization. If the VM has its own "imaginary" page tables on top of the hypervisor's page tables, then the number of actual physical memory reads goes from 4–6 to 16–36.
If I understood correctly, you'te talking about using descriptors to map segments; the issue with this approach is two-fold: it is slow (as each descriptor needs to be created for each segment - and sometimes more than one, if you need write-execute permissions), and there is a practical limit on the number of descriptors you can have - 8192 total, including call gates and whatnot. To extend this, you need to use LDTs, that - again - also require a descriptor in the GDT and are limited to 8192 entries. In a modern desktop system, 67 million segments would be both quite slow and at the same time quite limited.
2 replies →
But that wouldn't protect against out-of boundary access (which is the whole point of segments), would it?
3 replies →
Indeed. Also, TLB as it exists on x64 is not free, nor is very large. A multi-level "TLB", such that a process might pick an upper level of a large stretch of lower-level pages and e.g. allocate a disjoint micro-page for each stack frame, would be cool. But it takes a rather different CPU design.
"Please give me a segmented memory model on top of paged memory" - words which have never been uttered
2 replies →
> But we don't have a linear address space, unless you're working with a tiny MCU.
We actually do, albeit for a brief duration of time – upon a cold start of the system when the MCU is inactive yet, no address translation is performed, and the entire memory space is treated as a single linear, contiguous block (even if there are physical holes in it).
When a system is powered on, the CPU runs in the privileged mode to allow an operating system kernel to set up the MCU and activate it, which takes place early on in the boot sequence. But until then, virtual memory is not available.
Those holes can be arbitrarily large, though, especially in weirder environments (e.g., memory-mapped optane and similar). Linear address space implies some degree of contiguity, I think.
2 replies →
[dead]