Comment by xenadu02
17 hours ago
If you happen to know... what was the reasoning behind the oddball stack architecture? It feels like Intel must have had this already designed for some other purpose so they tossed it in. I can't imagine why anyone would think this arch was a good idea.
Then again... they did try to force VLIW and APX on us so Intel has a history of "interesting" ideas about processor design.
edit: You addressed it in the article and I guess that's probably the reason but for real... what a ridiculous hand-wavy thing to do. Just assume it will be fine? If the anecdotes about Itanium/VLIW are true they committed the same sin on that project: some simulations with 50 instructions were the (claimed) basis for that fiasco. Methinks cutting AMD out of the market might have been the real reason but I have no proof for that.
Stack-based architectures have an appeal, especially for mathematics. (Think of the HP calculator.) And the explanation that they didn't have enough instruction bits also makes sense. (The co-processor uses 8086 "ESCAPE" instructions, but 5 bits get used up by the ESCAPE itself.) I think that the 8087's stack could have been implemented a lot better, but even so, there's probably a reason that hardly any other systems use a stack-based architecture. And the introduction of out-of-order execution made stacks even less practical.
To expand on this a little bit more:
x86 has a general pattern of encoding operands, the ModR/M byte(s), which gives you either two register operands, or a register and a memory operand. Intel also did this trick that uses one of the register operand for extra opcode bits, at the cost of sacrificing one of the operands.
There are 8 escape opcodes, and all of them have a ModR/M byte trailing it. If you use two-address instructions, that gives you just 8 instructions you can implement... not enough to do anything useful! But if you're happy with one-address instructions, you get 64 instructions with a register operand and 64 instructions with a memory operand.
A stack itself is pretty easy to compile for, until you have to spill a register because there's too many live variables on the stack. Then the spill logic becomes a nightmare. My guess is that the designers were thinking along these lines--organizing the registers in the stack is an efficient way to use the encoding space, and a fairly natural way to write expressions--and didn't have the expertise or the communication to realize that the design came with some edge cases that were painfully sharp to deal with.
> there's probably a reason that hardly any other systems use a stack-based architecture
I don't know about other backend guys, but I disliked the stack architecture because it just incompatible with enregistering variables, register allocation by live range analysis, common subexpression elimination, etc.
There are software workarounds for some of those and very simple hardware workarounds for the others. In a stack-based architecture there should also be some directly-addressable registers for storing long-lived temporary variables. Most stack-based architectures included some set of stack shuffling operations that solved the problem of common subexpression elimination.
The real disadvantage is that the stack operations share the output operand, which introduces a resource dependency between otherwise independent operations, which prevents their concurrent execution.
There are hardware workarounds even for this, but the hardware would become much more complex, which is unlikely to be worthwhile.
The main influencer of the 8087 architecture, William Kahan, had previously worked on the firmware of the HP scientific calculators, so he was well experienced in implementing numeric algorithms by using stacks.
When writing in assembly language, the stack architecture is very convenient and it minimizes the program size. That is why most virtual machines used for implementing interpreters for programming languages have been stack-based.
The only real disadvantage of the stack architecture is that it prevents the concurrent execution of operations, because all operations have a resource dependency by sharing the stack as output location.
At the time when 8087 was designed, the possibility of implementing parallel execution of instructions in hardware was still very far in the future, so this disadvantage was dismissed.
Replacing the stack by individually addressable registers is not the only possible method for enabling concurrent execution of instructions. There are 2 alternatives that can continue to use a stack architecture.
One can have multiple operand stacks and each instruction must contain a stack number. Then the compiler assigns each chain of dependent operations to one stack and the CPU can execute in parallel as many independent chains of dependent instructions as there are stacks.
The other variant is to also have multiple operand stacks but to have the same instruction set with only one implicit stack, while implementing simultaneous multi-threading (SMT). Then each hardware thread uses its own stack while sharing the parallel execution units and then one can execute in parallel as many instructions as there are threads. For this variant one would need to have much more threads than in a CPU with registers, which combines superscalar execution with SMT, so one would need 8 or more SMT threads to be competitive.