Comment by HarHarVeryFunny
2 days ago
The "R exp" is subroutine call (which saves return address to register B00), and I believe "J Bjk" is the subroutine return.
The Cray-1 didn't have a hardware stack, so subroutine call is basically just jump there and back, using a register for the return address rather than pushing/popping it to/from the stack.
Another oddity of the instruction set that stands out (since I'm in process of defining a VM ISA for a hobby project) is that the branch instructions test a register (A0 or S0) rather than look at status flags. In a modern CPU a conditional branch, if (x < y), is implemented by compare then branch where the compare instruction sets flags as if it had done a subtraction, but doesn't actually modify the accumulator. In the Cray this is evidentially done by doing an actual subtraction, leaving the result in A0, then branching by looking at the value of A0 (vs looking at flags set by CMP).
Gemini explains this as being to help pipelining.
I recall when reading TAOCP that Knuth's MIX assembly supported subroutines by requiring the caller to modify the RET call to it's own address (obviously not re-entrant!). This sort of thing was common when Knuth started in the early 60's, may have still been around by the time of the Cray.
This Cray version isn't so bad - it just requires that if the callee itself calls other subroutines, then it has to save/restore this B00 return address register. You could still support re-entrant routines with this as long as you saved/restored the return address to a software stack rather than a fixed location, but I wonder if Cray compilers typically supported that?
Apparently the reason for using a register vs stack for return address was because memory access (stack) was so much slower.
I'm kinda tempted to use this for the VM I'm designing, which will run on the 6502 8-bit micro. The 6502 doesn't have any 16-bit operations, so pushing and popping using a software defined 16-bit stack pointer is slow, and saving return address to zero page would certainly be faster. It'd mean that rather than always pushing/popping the SP, you only do it in the callee if needed. It's an interesting idea!
Cray's design for Control Data before starting his own company was interesting. You were required to start each subroutine with a jump instruction, and the subroutine call instruction would modify the memory at that location to a jump back to the caller. To return from a subroutine you would just branch back to its beginning.