← Back to context

Comment by fidotron

2 months ago

> RISC-V will get there, eventually.

Not trolling: I legitimately don't see why this is assumed to be true. It is one of those things that is true only once it has been achieved. Otherwise we would be able to create super high performance Sparc or SuperH processors, and we don't.

As you note, Arm once was fast, then slow, then fast. RISC-V has never actually been fast. It has enabled surprisingly good implementations by small numbers of people, but competing at the high end (mobile, desktop or server) it is not.

I think the bigger question is does RISC-V need to be fast? Who wants to make it fast?

I'm a chip designer and I see people using RISC-V as small processor cores for things like PCIE link training or various bookkeeping tasks. These don't need to be fast, they need to be small and low power which means they will be relatively slow.

Most people on tech review sites only care about desktop / laptop / server performance. They may know about some of the ARM Cortex A series CPUs that have MMUs and can run desktop or smartphone Linux versions.

They generally don't care about the ARM Cortex M or R versions for embedded and real time use. Those are the areas where you don't need high performance and where RISC-V is already replacing ARM.

EDIT:

I'll add that there are companies that COULD make a fast RISC-V implementation.

Intel, AMD, Apple, Qualcomm, or Nvidia could redirect their existing teams to design a high performance RISC-V CPU. But why should they? They are heavily invested in their existing x86 and ARM CPU lines. Amazon and Google are using licensed ARM cores in their server CPUs.

What is the incentive for any of them to make a high performance RISC-V CPU? The only reason I can think of is that Softbank keeps raising ARM licensing costs and it gets high enough that it is more profitable to hire a team and design your own RISC-V CPU.

  • Of your list, Qualcomm and Nvidia are fairly likely to make high perf Riscv cpus. Qualcomm because Arm sued them to try and stop them from designing their own arm chips without paying a lot more money, and Nvidia because they already have a lot of teams making riscv chips, so it seems likely that they will try to unify on the one that doesn't require licensing.

    • Yeah, they could but then what is the market? Qualcomm wants to sell smartphone chips and Android can run on RISC-V and most Android Java apps could in theory run.

      But if you look at the Intel x86 smartphone chips from about 10 years ago they had to make an ARM to x86 emulator because even the Java apps contained native ARM instructions for performance reasons.

      Qualcomm is trying to push their ARM Snapdragon chips in Windows laptops but I don't think they are selling well.

      Nvidia could also make RISC-V based chips but where would they go? Nvidia is moving further away from the consumer space to the data center space. So even if Nvidia made a really fast RISC-V CPU it would probably be for the server / data center market and they may not even sell it to ordinary consumers.

      Or if they did it could be like the Ampere ARM chips for servers. Yeah you can buy one as an ordinary consumer but they were in the $4,000 range last time I looked. How many people are going to buy that?

      2 replies →

  • China is likely where it would come from - ARM and x86 are owned by Western companies.

  • > I think the bigger question is does RISC-V need to be fast? Who wants to make it fast?

    Honestly, the initial reaction is it sounds like cope, and I know this because I've been saying it for ages to angry reactions. RISC-V looks for all the world like it is designed for competing with the 32 bit Arm ecosystem but that the designers didn't, and still don't, understand what 64 bit Arm is about.

    Secondly, it's been necessary to claim such things are forever on the way in order to maintain hype and get software support. Without it you wouldn't see nearly so much Linux buildchain work. (See the open source SuperH implementations for what happens if you admit you don't go for high performance).

    Finally though, as process nodes get smaller you can afford to put much more complex blocks in the same area, which can then burst through a series of operations and power off again, many times a second. (Edit to add: of course you know that, but it's still counter intuitive the extent to which it changes things over time. People have things like floating point support in places that not too long ago would have been completely minimalist, and there are some really extreme examples around).

    > I'll add that there are companies that COULD make a fast RISC-V implementation.

    Again, there is no proof of this until it actually happens. When Qualcomm were trying they wanted to change the spec of RISC-V, and I strongly suspect that is actually necessary.

RISC-V doesn't have the pitfalls of Sparc (register windows, branch delay slots), largely because we learned from that. It's in fact a very "boring" architecture. There's no one that expects it'll be hard to optimize for. There are at least 2 designs that have taped out in small runs and have high end performance.

  • RISC-V does not have the pitfalls of experimental ISAs from 45 years ago, but it has other pitfalls that have not existed in almost any ISA since the first vacuum-tube computers, like the lack of means for integer overflow detection and the lack of indexed addressing.

    Especially the lack of integer overflow detection is a choice of great stupidity, for which there exists no excuse.

    Detecting integer overflow in hardware is extremely cheap, its cost is absolutely negligible. On the other hand, detecting integer overflow in software is extremely expensive, increasing both the program size and the execution time considerably, because each arithmetic operation must be replaced by multiple operations.

    Because of the unacceptable cost, normal RISC-V programs choose to ignore the risk of overflows, which makes them unreliable.

    The highest performance implementations of RISC-V from previous years were forced to introduce custom extensions for indexed addressing, but those used inefficient encodings, because something like indexed addressing must be in the base ISA, not in an extension.

    • OK, look.

      Since my previous attempt to measure the impact of trap on signed overflow didn't seem to have moved your position one bit, I thought I'd give it a go in the most representable way I could think of:

      I build the same version of clang on a x86, aarch64 and RISC-V system using clang. Then I build another version with the `-ftrapv` flag enabled and compared the compiletimes of compiling programs using these clang builds running on real hardware:

          runtime:         x86         | aarch64                    | RISC-V (RVA23)
                           Zen1        |  A78          A55*         |  X100         A100  !!! all cores clocked to about 2.2GHz, Zen1 can reach almost 4GHz
          clang A:         3.609±0.078 |  4.209±0.050   9.390±0.029 |  5.465±0.070  11.559±0.020
          clang-ftrapv A:  3.613±0.118 |  4.290±0.050   9.418±0.056 |  5.448±0.060  11.579±0.030
          clang B:         8.948±0.100 | 10.983±0.188  22.827±0.016 | 13.556±0.016  28.682±0.023
          clang-ftrapv B:  8.960±0.125 | 11.099±0.294  22.802±0.039 | 13.511±0.018  28.741±0.050
      
      
      

      As you can see, once again the overhead of -ftrapv is quite low.

      Suprizinglt the -ftrapv overhead seems the highest on the Cortex-A78. My guess is that this because clang generates a seperate brk with unique immediate for every overflow check, while on RISC-V it always branches to one unimp per function.

      Please tell me if you have a better suggestion for measuring the real world impact.

      Or heck, give me some artificial worst case code. That would also be an interesting data point.

      Notes:

      * The format is mean±variance

      * Spacemit X100 is a Cortex-A76 like OoO RISC-V core and A100 an in-order RISC-V core.

      * I tried to clock all of the cores to the same frequency of about 2.2GHz. *Except for the A55, which ran at 1.8GHz, but I linearly scaled the results.

      * Program A was the chibicc (8K loc) compiler and program B microjs (30K loc).

          binary size:
                        x86        aarch64    RISC-V
          clang:        212807768  216633784  195231816
          clang-ftrapv: 212859280  216737608  195419512
          increase:     0.24%      0.047%     0.09%

      3 replies →

    • > On the other hand, detecting integer overflow in software is extremely expensive, increasing both the program size and the execution time considerably,

      Most languages don't care about integer overflow. Your typical C program will happily wrap around.

      If I really want to detect overflow, I can do this:

          add t0, a0, a1
          blt t0, a0, overflow
      

      Which is one more instruction, which is not great, not terrible.

      18 replies →

    • > On the other hand, detecting integer overflow in software is extremely expensive

      this just isn't true. both addition and multiplication can check for overflow in <2 instructions.

      11 replies →

  • As a counterexample, I point to another relatively boring RISC, PA-RISC. It took off not (just) because the architecture was straightforward, but because HP poured cash into making it quick, and PA-RISC continued to be a very competitive architecture until the mass insanity of Itanic arrived. I don't see RISC-V vendors making that level of investment, either because they won't (selling to cheap markets) or can't (no capacity or funding), and a cynical take would say they hide them behind NDAs so no one can look behind the curtain.

    I know this is a very negative take. I don't try to hide my pro-Power ISA bias, but that doesn't mean I wouldn't like another choice. So far, however, I've been repeatedly disappointed by RISC-V. It's always "five or six years" from getting there.

    • I would not call PA-RISC boring. Already at launch there was no doubt that it is a better ISA than SPARC or MIPS, and later it was improved. At the time when PA-RISC 2.0 was replaced by Itanium it was not at all clear which of the 2 ISAs is better. The later failures to design high-performance Itanium CPUs make plausible that if HP would have kept PA-RISC 2.0 they might have had more competitive CPUs than with Itanium.

      SPARC (formerly called Berkeley RISC) and MIPS were pioneers that experimented with various features or lack of features, but they were inferior from many points of view to the earlier IBM 801.

      The RISC ISAs developed later, including ARM, HP PA-RISC and IBM POWER, have avoided some of the mistakes of SPARC and MIPS, while also taking some features from IBM 801 (e.g. its addressing modes), so they were better.

      7 replies →

  • > RISC-V doesn't have the pitfalls of Sparc (register windows, branch delay slots),

    You're saying ISA design does have implementation performance implications then? ;)

    > There's no one that expects it'll be hard to optimize for

    [Raises hand]

    > There are at least 2 designs that have taped out in small runs and have high end performance.

    Are these public?

    Edit: I should add, I'm well aware of the cultural mismatch between HN and the semi industry, and have been caught in it more than a few times, but I also know the semi industry well enough to not trust anything they say. (Everything from well meaning but optimistic through to outright malicious depending on the company).

    • The 2 designs I'm thinking of are (tiresomely) under NDA, although I'm sure others will be able to say what they are. Last November I had a sample of one of them in my hand and played with the silicon at their labs, running a bunch of AI workloads. They didn't let me take notes or photographs.

      > There's no one that expects it'll be hard to optimize for

      No one who is an expert in the field, and we (at Red Hat) talk to them routinely.

      1 reply →

I don't think anybody suggests Oracle couldn't make faster SPARC processors, it's just that development of SPARC ended almost 10 years ago. At the time SPARC was abandoned, it was very competitive.

  • In single-threaded performance? That’s not how I remember it: Sun was pushing parallel throughput over everything else, with designs like the T-Series & Rock.

    • Perhaps not single thread, but Rock was a dead end a while before Oracle pulled the plug, and Sun/Oracle's core market of course was always servers not workstations. We used Niagara machines at my work around the T2 era, a long time ago, but they were very competitive if you could saturate the cores and had the RAM to back it up.

      1 reply →

Because today, getting a fast CPU out it isn't as much an engineering issue as it is about getting the investment for hiring a world-class fab.

The most promising RISC-V companies today have not set out to compete directly with Intel, AMD, Apple or Samsung, but are targeting a niche such as AI, HPC and/or high-end embedded such as automotive.

And you can bet that Qualcomm has RISC-V designs in-house, but only making ARM chips right now because ARM is where the market for smartphone and desktop SoCs is. Once Google starts allowing RVA23 on Android / ChromeOS, the flood gates will open.

  • It's very much both. You need millions of dollars for the fab, but you also need ~5 years to get 3 generations of cpus out (to fix all the performance bugs you find in the first two)

Fast, RVA23-compatible microarchitectures already exist. Everything high performance seems to be based on RVA23, which is the current application profile and comparable to ARMv9 and x86-64v4.

However, it takes time from microarchitecture to chips, and from chips to products on shelves.

The very first RVA23-compatible chips to show up will likely be the spacemiT K3 SoC, due in development boards April (i.e. next month).

More of them, more performant, such as a development board with the Tenstorrent Ascalon CPU in the form of the Atlantis SoC, which was tapped out recently, are coming this summer.

It is even possible such designs will show up in products aimed at the general public within the present year.