← Back to context

Comment by Manfred

1 year ago

> At least in the context of x86 emulation, among all 3 architectures we support, RISC-V is the least expressive one.

RISC was explained to me as a reduced instruction set computer in computer science history classes, but I see a lot of articles and proposed new RISC-V profiles about "we just need a few more instructions to get feature parity".

I understand that RISC-V is just a convenient alternative to other platforms for most people, but does this also mean the RISC dream is dead?

As I've heard it explained, RISC in practise is less about "an absolutely minimalist instruction set" and more about "don't add any assembly programmer conveniences or other such cleverness, rely on compilers instead of frontend silicon when possible".

Although as I recall from reading the RISC-V spec, RISC-V was rather particular about not adding "combo" instructions when common instruction sequences can be fused by the frontend.

My (far from expert) impression of RISC-V's shortcomings versus x86/ARM is more that the specs were written starting with the very basic embedded-chip stuff, and then over time more application-cpu extensions were added. (The base RV32I spec doesn't even include integer multiplication.) Unfortunately they took a long time to get around to finishing the bikeshedding on bit-twiddling and simd/vector extensions, which resulted in the current functionality gaps we're talking about.

So I don't think those gaps are due to RISC fundamentalism; there's no such thing.

  • Put another way, "try to avoid instructions that can't be executed in a single clock cycle, as those introduce silicon complexity".

    • But that's not even close to true, either, eg any division or memory operation.

      In practice there's no such thing as "RISC" or "CISC" anymore really, they've all pretty much converged. At best you can say "RISC" now just means that there aren't any mixed load + alu instructions, but those aren't really used in x86 much, either

      1 reply →

  • >and more about "don't add any assembly programmer conveniences or other such cleverness, rely on compilers instead of frontend silicon when possible"

    What are the advantages of that?

    • It shifts implementation complexity from hardware onto software. It's not an inherent advantage, but an extra compiler pass is generally cheaper than increased silicon die area, for example.

      On a slight tangent, from a security perspective, if your silicon is "too clever" in a way that introduces security bugs, you're screwed. On the other hand, software can be patched.

      2 replies →

    • Instructions can be completed in one clock cycle, which removes a lot of complexity compared to instructions that require multiple clock cycles.

      Removed complexity means you can fit more stuff into the same amount of silicon, and have it be quicker with less power.

      6 replies →

In order to have an instruction set that a student can implement in a single semester class you need to make simplifications like having all instructions have two inputs and one output. That also makes the lives of researchers experimenting one processor design a lot simpler as well. But it does mean that some convenient instructions are off the table for getting to higher performance.

That's not the whole story, a simpler pipeline takes less engineering resources for teams going to a high performance design so they can spend more time optimizing.

RISC is generally a philosophy of simplification but you can take it further or less far. MIPS is almost as simplified as RISC-V but ARM and POWER are more moderate in their simplifications and seem to have no trouble going toe to toe with x86 in high performance arenas.

But remember there are many niches for processors out there besides running applications. Embedded, accelerators, etc. In the specific niche of application cores I'm a bit pessimistic about RISC-V but from a broader view I think it has a lot of potential and will probably come to dominate at least a few commercial niches as well as being a wonderful teaching and research tool.

The RISC dream was to simplify CPU design because most software was written using compilers and not direct assembly.

Characteristics of classical RISC:

- Most data manipulation instructions work only with registers.

- Memory instructions are generally load/store to registers only.

- That means you need lots of registers.

- Do your own stack because you have to manually manipulate it to pass parameters anyway. So no CALL/JSR instruction. Implement the stack yourself using some basic instructions that load/store to the instruction pointer register directly.

- Instruction encoding is predictable and each instruction is the same size.

- More than one RISC arch has a register that always reads 0 and can't be written. Used for setting things to 0.

This worked, but then the following made it less important:

- Out-of-order execution - generally the raw instruction stream is a declaration of a path to desired results, but isn't necessarily what the CPU is really doing. Things like speculative execution, branch prediction and register renaming are behind this.

- SIMD - basically a separate wide register space with instructions that work on all values within those wide registers.

So really OOO and SIMD took over.

Is there a RISC dream? I think there is an efficiency "dream", there is a performance "dream", there is a cost "dream" — there are even low-complexity relative to cost, performance and efficiency "dreams" — but a RISC dream? Who cares more about RISC than cost, performance, efficiency and simplicity?

  • There was such dream. It was about getting the mind-bogglingly simple CPU, put caches into the now empty place where all the control logic used to be, and clock it up the wazoo, and let the software deal with load/branch delays, efficiently using all 64 registers, etc. That'll beat the hell out of those silly CISC architectures at performance, and at the fraction of the design and production costs!

    This didn't work out, for two main reasons: first, just being able to turn clocks hella high is still not enough to get great performance: you really do want your CPU to be super-scalar, out-of-order, and with great branch predictor, if you need amazing performance. But when you do all that, the simplicity of RISC decoding stops mattering all that much, as Pentium II demonstrated when it equalled DEC Alpha on performance, while still having practically useful things like e.g. byte loads/stores. Yes, it's RISC-like instructions under the hood but that's an implementation detail, no reason to expose it to the user in the ISA, just as you don't have to expose the branch delay slots in your ISA because it's a bad idea to do so: e.g. MIPS II added 1 additional pipeline stage, and now they needed two branch/load delay slots. Whoops! So they added interlocks anyway (MIPS originally stood for "Microprocessor without Interlocked Pipelined Stages", ha-ha) and got rid of the load delays; they still left 1 branch delay slot exposed due to backwards compatibility, and the circuitry required was arguably silly.

    The second reason was that the software (or compilers, to be more precise) can't really deal very well with all that stuff from the first paragraph. That's what sank Itanium. That's why nobody makes CPUs with register windows any more. And static instruction scheduling in the compilers still can't beat dynamic instruction reordering.

    • Great post as it is also directly applicable to invalidate the myth that the arm instruction set somehow makes the whole cpu better than analogous x86 silicon. It might be true and responsible for like 0.1% (guesstimate) of the total advantage; it's actually all RISC under the hood and both ISAs need decoders, x86 might need a slightly bigger one which amounts to accounting noise in terms of area.

      c.f. https://chipsandcheese.com/2021/07/13/arm-or-x86-isa-doesnt-...

    • > This didn't work out

      ... except it did.

      You had literal students design chips that outperformed industry cores that took huge teams and huge investment.

      Acorn had a team of just a few people build a core that outperformed an i460 with likely 1/100 investment. Not to mention the even more expensive VAX chips.

      Can you imagine how fucking baffled the DEC engineers at the time were when their absurdly complex and absurdly expensive VAX chip were smocked by a bunch of first time chip designers?

      > as Pentium II demonstrated

      That chip came out in 1997. The original RISC chip research happened in the early 80s or even earlier. It did work, its just that x86 was bound to the PC market and Intel had the finances huge teams hammer away at the problem. x86 was able to overtake Alpha because DEC was not doing well and they couldn't invest the required amount.

      > no reason to expose it to the user in the ISA

      Except that hidden the implementation is costly.

      If you give 2 equal teams the same amount of money, what results in a faster chip. A team that does a simply RISC instruction set. Or a team that does a complex CISC instruction set, transforms that into an underlying simpler instruction set?

      Now of course for Intel, they had backward comparability so they had to do what they had to do. They were just lucky they were able to invest so much more then all the other competitors.

      4 replies →

    • To add on to what the sibling said, ignoring that CISC chips have a separate frontend to break complex instructions down into an internal RISC-like instruction set and thus the difference is blurred, more RISC instruction sets do tend to win on performance and power for the main reason that the instruction set has a fixed width. This means that you can fetch a line of cache and 4 byte instructions you could start decoding 32 instructions in parallel whereas x86’d variableness makes it harder to keep the super scalar pipeline full (it’s decoder is significantly more complex to try to still extract parallelism which further slows it down). This is a bit more complex on ARM (and maybe RISCV?) where you have two widths but even then in practice it’s easier to extract performance out of it because x86 can be anywhere from 1-4 bytes (or 1-8? Can’t remember) which makes it hard to find boundary instructions in parallel.

      There’s a reason that Apple is whooping AMD and Intel on performance/watt and it’s not solely because they’re on a newer fab process (it’s also why AMD and Intel utterly failed to get mobile CPU variants of their chips off the ground).

      4 replies →

  • But we define the RISC dream as a dream that efficiency, performance and low-cost could be achieved by cores with very small instruction sets?

    • Not small instruction sets, simplified instruction sets. RISC’s main trick is to reduce the number of addressing modes (eg, no memory indirect instructions) and reduce the number of memory operands per instruction to 0 or 1. Use the instruction encoding space for more registers instead.

      The surviving CISCs, x86 and z390 are the least CISCy CISCs. The surviving RISCs, arm and power, are the least RISCy RISCs.

      RISC V is a weird throwback in some aspects of its instruction set design.

      2 replies →

In this particular context, they're trying to run code compiled for x86_64 on RISCV5. The need from "we just need a few more instructions to get feature parity" comes from trying to run code that is already compiled for an architecture with all those extra instructions.

In theory, if you compiled the original _source_ code for RISC, you'd get an entirely binary and wouldn't need those specific instructions.

In practice, I doubt anyone is going to actually compile these games for RISCV5.

The explanation that I've seen is that it's "(reduced instruction) set computer" - simple instructions, not necessarily few.

Beyond the most trivial of microcontrollers and experimental designs there are no RISC chips under the original understanding of RISC. The justification for RISC evaporated when we became able to put 1 million, 100 million, and so on, transistors on a chip. Now all the chips called "RISC" include vector, media, encryption, network, FPUs, and etc. instructions. Someone might want to argue that some elements of RISC designs (orthogonal instruction encoding, numerous registers, etc.) make a particular chip a RISC chip. But they really aren't instances of the literal concept of RISC.

To me, the whole RISC-V interest is all just marketing. As an end user I don't make my own chips and I can't think of any particular reason I should care whether a machine has RISC-V, ARM, x86, SPARC, or POWER. In the end my cost will be based on market scale and performance. The licensing cost of the design will not be passed on to me as a customer.