← Back to context

Comment by mpweiher

3 days ago

> Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?

Because the attempts at segmented or object-oriented address spaces failed miserably.

> Linear virtual addresses were made to be backwards-compatible with tiny computers with linear physical addresses but without virtual memory.

That is false. In the Intel World, we first had the iAPX 432, which was an object-capability design. To say it failed miserably is overselling its success by a good margin.

The 8086 was sort-of segmented to get 20 bit addresses out of a 16 bit machine and a stop-gap and a huge success. The 80286 did things "properly" again and went all-in on the segments when going to virtual memory...and sucked. Best I remember, it was used almost exclusively as a faster 8086, with the 80286 modes used to page memory in and out and with the "reset and recover" hack to then get back to real mode for real work.

The 80386 introduced the flat address space and paged virtual memory not because of backwards-compatibility, but because it could and it was clearly The Right Thing™.

The Burroughs large system architecture of the 1960s and 1970s (B6500, B6700 etc.) did it. Objects were called “arrays” and there was hardware support for allocating and deallocating them in Algol, the native language. These systems were initially aimed at businesses (for example, Ford was a big customer for such things as managing what parts were flowing where) I believe, but later they managed to support FORTRAN with its unsafe flat model.

These were large machines (think of a room 20m square) and with explicit hardware support for Algol operations including the array stuff and display registers for nested functions, were complex and power hungry and with a lot to go wrong. Eventually, with the technology of the day, they became uncompetitive against simpler architectures. By this time too, people wanted to program in languages like C++ that were not supported.

With today’s technology, it might be possible.

> The 80386 introduced the flat address space ...

This may be misleading: the 80386 introduced flat address space and paged virtual memory _in the Intel world_, not in general. At the time it was introduced, linear / flat address space was the norm for 32 bit architectures, with examples such as the VAX, the MC68K, the NS32032 and the new RISC processors. The IBM/360 was also (mostly) linear.

So with the 80386, Intel finally abandoned their failed approach of segmented address spaces and joined the linear rest of the world. (Of course the 386 is technically still segmented, but let's ignore that).

And they made their new CPU conceptually compatible with the linear address space of the big computers of the time, the VAXens and IBM mainframes and Unix workstations. Not the "little" ones.

  • The important bit here is "their failed approach", just because Intel made a mess of it, doesn't mean that the entire concept is flawed.

    (Intel is objectively the most lucky semiconductor company, in particular if one considers how utterly incompetent their own "green-field" designs have been.

    Think for a moment how luck a company has to be, to have the major competitor they have tried to kill with all means available, legal and illegal, save your company, when you bet the entire farm on Itanic ?)

    • It isn't 100% proof that the concept is flawed, but the fact that the for decades most successful CPU manufacturer in the world couldn't make segmentation work in multiple attempts is pretty strong evidence that at least there are, er, "issues" that aren't immediately obvious.

      I think it is safe to assume that they applied what they learned from their earlier failures to their later failures.

      Again, we can never be 100% certain of counterfactuals, but certainly the assertion that linear address spaces were only there for backwards compatibility with small machines is simply historically inaccurate.

      Also, Intel weren't the only ones. The first MMU for the Motorola MC68K was the MC68451, which was a segmented MMU. It was later replaced by the MC68851, a paged MMU. The MC68451, and segmentation, was both rarely used and then discontinued. The MC68851 was comparatively widely used, and later integrated in simplified form into future CPUs like the MC68030 and its successors.

      So there as well, segmentation was tried first and then later abandoned. Which again, isn't definitive proof that segmentation is flawed, but way more evidence than you give credit for in your article.

      People and companies again and again start out with segmentation, can't make it work and then later abandon it for linear paged memory.

      My interpretation is that segmentation is one of those things that sounds great in theory, but doesn't work nearly as well in practice. Just thinking about it in the abstract, making an object boundary also a physical hardware-enforced protection boundary sounds absolutely perfect to me! For example something like the LOOM object-based virtual memory system for Smalltalk (though that was more software).

      But theory ≠ practice. Another example of something that sounded great in theory was SOAR: Smalltalk on a RISC. They tried implementing a good part of the expensive bits of Smalltalk in silicon in a custom RISC design. It worked, but the benefits turned out to be minimal. What actually helped were larger caches and higher memory bandwidth, so RISC.

      Another example was the Rekursiv, which also had object-capability addressing and a lot of other OO features in hardware. Also didn't go anywhere.

      Again: not everything that sounds good in theory also works out in practice.

      2 replies →

  • <i>So with the 80386, Intel finally abandoned their failed approach of segmented address spaces and joined the linear rest of the world. (Of course the 386 is technically still segmented, but let's ignore that).</i>

    That seems an odd interpretation of how they extended the 286 protected mode on the 386. The 286 converted the fixed address+64k sized segment registers to 'selectors' in the LDT/GDT which added permissions/etc to a segment descriptor structure which were transparently cached along with the 'base' of the segment in generally invisible portions of the register. The problem with this approach was the same as CHERI/etc that it requires a fat pointer comprising the segment+offset which to this day remains problematic with standard C where certain classes of programmers expect that sizeof (void*) == sizeof (int or long).

    Along comes the 386 with adds a further size field (limit) to the segment descriptor which can be either bytes or pages.

    And of course it added the ability to back linear addresses with paging, if enabled.

    Its entirely possible to run the 386 in an object=segment only mode where each data structure exists in its own segment descriptor and the hardware enforces range checking, and heap compression/etc can happen automatically by simply copying the segment to another linear address and adjusting the base address. By today standards the number of outstanding segment descriptors is limiting, but remember 1985 when a megabyte of RAM was a pretty reasonable amount...

    The idea that someone would create a couple descriptors with base=0:limit=4G and set all the segment register to them, in order to assure that int=void * is sorta a known possible misuse of the core architecture. Of course this basically requires paging as the processor then needs to deal with the fact that it likely doesn't actually have 4G of ram, and the permissions model then is enforced at a 4K granularity. Leaving open all the issues C has with buffer overflows, and code + data permissions mixing/etc. Its not a better model, just one easier to reason about initially, but then for actual robust software starts to fall apart for long running processes due to address space fragmentation and a lot of other related problems.

    AKA, it wasn't necessarily the best choice, and we have been dealing with the repercussion of lazy OS/systems programmers for the 40 years since.

    PS: intel got(gets) a lot of hate from people wanting to rewrite history, by ignoring the release date of many of these architectural advancements. Ex the entire segment register 'fiasco' is a far better solution than the banked memory systems available in most other 8/16 bit machines. The 68000 is fully a year later in time, and makes no real attempt at being backwards compatible with the 6800 unlike the 8086 which is clearly intended to be a replacement for the 8080.

    • You did see that bit you quoted?

      > (Of course the 386 is technically still segmented, but let's ignore that)

      Yes, the 80386 was still technically segmented, but the overwhelming majority of operating systems (95%+) effectively abandoned segmentation for memory protection and organization, except for very broad categories such as kernel vs. user space.

      Instead, they configured the 80386 registers to provide a large linear address space for user processes (and usually for the kernel as well).

      > The idea that someone would create a couple descriptors with base=0:limit=4G and set all the segment register to them, in order to assure that int=void * is sorta a known possible misuse of the core architecture

      The thing that you mischaracterize as a "misuse" of the architecture wasn't just some corner case that was remotely "possible", it was what 95% of the industry did.

      The 8086 wasn't so much a design as a stopgap hail-mary pass following the fiasco of the iAPX 432. And the VAX existed long before the 8086.

      2 replies →

> Because the attempts at segmented or object-oriented address spaces failed miserably.

> That is false. In the Intel World, we first had the iAPX 432, which was an object-capability design. To say it failed miserably is overselling its success by a good margin.

I would further posit that segmented and object-oriented address spaces have failed and will continue to fail for as long as we have a separation into two distinct classes of storage: ephemeral (DRAM) and persistent storage / backing store (disks, flash storage, etc.) as opposed to having a single, unified concept of nearly infinite (at least logically if not physically), always-on just memory where everything is – essentially – an object.

Intel's Optane has given us a brief glimpse into what such a future could look like but, alas, that particular version of the future has not panned out.

Linear address space makes perfect sense for size-constrained DRAM, and makes little to no sense for the backing store where a file system is instead entrusted with implementing an object-like address space (files, directories are the objects, and the file system is the address space).

Once a new, successful memory technology emerges, we might see a resurgence of the segmented or object-oriented address space models, but until then, it will remain a pipe dream.

  • I don't see how any amount of memory technology can overcome the physical realities of locality. The closer you want the data to be to your processor, the less space you'll have to fit it. So there will always be a hierarchy where a smaller amount of data can have less latency, and there will always be an advantage to cramming as much data as you can at the top of the hierarchy.

    • while that's true, CPUs already have automatically managed caches. it's not too much of a stretch to imagine a world in which RAM is automatically managed as well and you don't have a distinction between RAM and persistent storage. in a spinning rust world, that never would have been possible, but with modern nvme, it's plausible.

      3 replies →

    • «Memory technology» as in «a single tech» that blends RAM and disk into just «memory» and obviates the need for the disk as a distinct concept.

      One can conjure up RAM, which has become exabytes large and which does not lose data after a system shutdown. Everything is local in such a unified memory model, is promptly available to and directly addressable by the CPU.

      Please do note that multi-level CPU caches still do have their places in this scenario.

      In fact, this has been successfully done in the AS/400 (or i Series), which I have mentioned elsewhere in the thread. It works well and is highly performant.

      1 reply →

  • I think seamless persistent storage is also bound to fail. There are significant differences on how we treat ephemeral objects in programs and persistent storage. Ephemeral objects are low value, if something goes wrong we can just restart and recover from storage. Persistent storage is often high value, we make significant effort to guarantee its consistency and durability even in the presence of crashes.

  • I shudder to think about the impact of concurrent data structures fsync'ing on every write because the programmer can't reason about whether the data is in memory where a handful of atomic fences/barriers are enough to reason about the correctness of the operations, or on disk where those operations simply do not exist.

    Also linear regions make a ton of sense for disk, and not just for performance. WAL-based systems are the cornerstone of many databases and require the ability to reserve linear regions.

    • Linear regions are mostly a figment of imagination in real life, but they are a convenient abstraction and a concept.

      Linear regions are nearly impossible to guarantee, unless the underlying hardware has specific, controller-level provisions.

        1) For RAM, the MCU will obscure the physical address of a memory page, which can come from a completely separate memory bank. It is up to the VMM implementation and heuristics to ensure the contiguous allocation, coalesce unrelated free pages into a new, large allocation or map in a free page from a «distant» location.
      
        2)  Disks (the spinning rust variety) are not that different.  A freed block can be provided from the start of the disk. However, a sophisticated file system like XFS or ZFS, and others like it, will make an attempt do its best to allocate a contiguous block.
      
        3) Flash storage (SSDs, NVMe) simply «lies» about the physical blocks and does it for a few reasons (garbage collection and the transparent reallocation of ailing blocks – to name a few). If I understand it correctly, the physical «block» numbers are hidden even from the flash storage controller and firmware themselves.
      

      The only practical way I can think of to ensure the guaranteed contiguous allocation of blocks unfortunately involves a conventional hard drive that has a dedicated partition created just for the WAL. In fact, this is how Oracle installation worked – it required a dedicated raw device to bypass both the VMM and the file system.

      When RAM and disk(s) are logically the same concept, WAL can be treated as an object of the «WAL» type with certain properties specific to this object type only to support WAL peculiarities.

      2 replies →

    • otoh, WAL systems are only necessary because storage devices present an interface of linear regions. the WAL system could move into the hardware.

> > Linear virtual addresses were made to be backwards-compatible with tiny computers with linear physical addresses but without virtual memory.

> That is false. In the Intel World, we first had the iAPX 432, which was an object-capability design. To say it failed miserably is overselling its success by a good margin.

That's not refuting the point he's making. The mainframe-on-chip iAPX family (and Itanium after) died and had no heirs. The current popular CPU families are all descendents of the stopgap 8086 evolved from the tiny computer CPUs or ARM's straight up embedded CPU designs.

But I do agree with your point that a flat (global) virtual memory space is a lot nicer to program. In practice we've been fast moving away from that again though, the kernel has to struggle to keep up the illusion: NUCA, NUMA, CXL.mem, various mapped accelerator memories, etc.

Regarding the iAPX 432, I do want to set the record straight as I think you are insinuating that it failed because of its object memory design. The iAPX failed mostly because of it's abject performance characteristics, but that was in retrospect [1] not inherent to the object directory design. It lacked very simple look ahead mechanisms, no instruction or data caches, no registers and not even immediates. Performance did not seemed to be a top priority in the design, to paraphrase an architect. Additionally, the compiler team was not aligned and failed to deliver on time, which only compounded the performance problem.

  - [1] https://dl.acm.org/doi/10.1145/45059.214411

  • The way you selectively quoted: yes, you removed the refutation.

    And regarding the iAPX 432: it was slow in large part due to the failed object-capability model. For one, the model required multiple expensive lookups per instruction. And it required tremendous numbers of transistors, so many that despite forcing a (slow) multi-chip design there still wasn't enough transistor budget left over for performance enhancing features.

    Performance enhancing features that contemporary designs with smaller transistor budgets but no object-capability model did have.

    Opportunity costs matter.

    • I agree on the part of the opportunity cost and that given the transistor budgets of the time a simpler design would have served better.

      I fundamentally disagree on putting the majority of the blame on the object memory model. The problem was that they were compounding the added complexity of the object model with a slew of other unnecessary complexities. They somehow did find the budget to put the first full IEEE floating point unit on the execution unit, implemented a massive[1] decoder and microcode for the bit-aligned 200+ instruction set and interprocess communication. The expensive lookups per instructions had everything to do with cutting caches and programmable registers, not any kind of overwhelming complexity to the address translation.

      I strongly recommend checking the "Performance effects of architectural complexity in the Intel 432" paper by Colwell that I linked in the parent.

      [1] die shots: https://oldbytes.space/@kenshirriff/110231910098167742

    • I huge factor in iAPX432 utter lack of success, were technological restrictions, like pin-count limits, laid down by Intel Top Brass, which forced stupid and silly limitations on the implementation.

      That's not to say that iAPX432 would have succeeded under better management, but only to say that you cannot point to some random part of the design and say "That obviously does not work"

For object you have the IBM i Series / AS400 based systems which used an object capabilities model (as far as I understand it). A refinement and simplification of what was pioneered in the less successful System/38.

For linear, you have the Sun SPARC processor coming out in 1986, the same year that 386 shipped in volume. I think the use by Unix of linear made it more popular (the MIPSR2000 came out in January 1986, also).

  • > IBM i Series / AS400

    Aren't AS400 closer to JVM than to a AXP432 in it's implementation details? Sans IBM's proprietary lingo, TIMI is just a bytecode virtual machine and was designed as such from the beginning. Then again, on a microcoded CPU it's hard to tell the difference.

    > I think the use by Unix of linear made it more popular

    More like linear address space was the only logical solution since EDVAC. Then in late 50's Manchester Atlas invented virtual memory to abstract away the magnetic drums. Some smart minds (Robert S. Barton with his B5000 which was a direct influence for JVM but was he the first one?) released what we actually want is segment/object addressing. Multics/GE-600 went with segments (couldn't find any evidence they were directly influenced by B5000 but seems so).

    System/360, which was the pre-Unix lingo franca, went with flat address space. Guess IBM folks wanted to go as conservative as possible. They also wanted S/360 to compete in HPC as well so performance was quite important - and segment addressing doesn't give you that. Then VM/370 showed that flat/paged addressing allows you to do things segments can't. And then came PDP-11 (which was more or less about compressing S/360 into a mini, sorry DEC fans), RMS/VMS and Unix.

    • > TIMI is just a bytecode virtual machine and was designed as such from the beginning.

      It's a bit more complicated than that. For one, it's an ahead-of-time translation model. The object references are implemented as tagged pointers to a single large address space. The tagged pointers rely on dedicated support in the Power architecture. The Unix compatibility layer (PASE) simulates per-process address spaces by allocating dedicated address space objects for each Unix process (these are called Terraspace objects).

      When I read Frank Soltis' book a few years ago, the description of how the single level store was implemented involved segmentation, although I got the impression that the segments are implemented using pages in the Power architecture. The original CISC architecture (IMPI) may have implemented segmentation directly, although there is very little documentation on that architecture.

      This document describes the S/38 architecture, and many of the high level details (if not the specific implementation) also apply to the AS/400 and IBM i: https://homes.cs.washington.edu/~levy/capabook/Chapter8.pdf

> iAPX 432 Yes, this was a failure, the Itanium of the 1980's

I also regard ADA as a failure. I worked with it many years ago. ADA would take 30 minutes to compile a program. Turbo C++ compiled equivalent code in a few seconds.

  • Machines are thousands of times faster now, yet C++ compilation is still slow somehow (templates? optimization? disinterest in compiler/linker performance? who knows?) Saving grace is having tons of cores and memory for parallel builds. Linking is still slow, though.

    Of course Pascal compilers (Turbo Pascal etc.) could be blazingly fast since Pascal was designed to be compiled in a single pass, but presumably linking was faster as well. I wonder how Delphi or current Pascal compilers compare? (Pascal also supports bounded strings and array bounds checks IIRC.)

    • > I wonder how Delphi or current Pascal compilers compare?

      Just did a full build of our main Delphi application on my work laptop, sporting an Intel i7-1260P. It compiled and linked just shy of 1.9 million lines of code in 31 seconds. So, still quite fast.