Don't blame the ISA - blame the silicon implementations AND the software with no architecture-specific optimisations.
RISC-V will get there, eventually.
I remember that ARM started as a speed demon with conscious power consumption, then was surpassed by x86s and PPCs on desktops and moved to embedded, where it shone by being very frugal with power, only to now be leaving the embedded space with implementations optimised for speed more than power.
All of those things are solved with modern extensions. It's like comparing pre-MMX x86 code with modern x86. Misaligned loads and stores are Zicclsm, bit manipulation is Zb[abcs], atomic memory operations are made mandatory in Ziccamoa.
All of these extensions are mandatory in the RVA22 and RVA23 profiles and so will be implemented on any up to date RISC-V core. It's definitely worth setting your compiler target appropriately before making comparisons.
Regarding misaligned reads, IIRC only x86 hides non-aligned memory access. It's still slower than aligned reads. Other processors just fault, so it would make sense to do the same on riscv.
The problem is decades of software being written on a chip that from the outside appears not to care.
Also the bit manipulation extension wasn't part of the core. So things like bit rotation is slow for no good reason, if you want portable code. Why? Who knows.
Huh? They have no idea what they are doing. If data is unaligned, the solution is memcpy, not compiler optimizations, also their hack of 17 loads is buffer overflow. Also not ISA spec problem.
Not trolling: I legitimately don't see why this is assumed to be true. It is one of those things that is true only once it has been achieved. Otherwise we would be able to create super high performance Sparc or SuperH processors, and we don't.
As you note, Arm once was fast, then slow, then fast. RISC-V has never actually been fast. It has enabled surprisingly good implementations by small numbers of people, but competing at the high end (mobile, desktop or server) it is not.
I think the bigger question is does RISC-V need to be fast? Who wants to make it fast?
I'm a chip designer and I see people using RISC-V as small processor cores for things like PCIE link training or various bookkeeping tasks. These don't need to be fast, they need to be small and low power which means they will be relatively slow.
Most people on tech review sites only care about desktop / laptop / server performance. They may know about some of the ARM Cortex A series CPUs that have MMUs and can run desktop or smartphone Linux versions.
They generally don't care about the ARM Cortex M or R versions for embedded and real time use. Those are the areas where you don't need high performance and where RISC-V is already replacing ARM.
EDIT:
I'll add that there are companies that COULD make a fast RISC-V implementation.
Intel, AMD, Apple, Qualcomm, or Nvidia could redirect their existing teams to design a high performance RISC-V CPU. But why should they? They are heavily invested in their existing x86 and ARM CPU lines. Amazon and Google are using licensed ARM cores in their server CPUs.
What is the incentive for any of them to make a high performance RISC-V CPU? The only reason I can think of is that Softbank keeps raising ARM licensing costs and it gets high enough that it is more profitable to hire a team and design your own RISC-V CPU.
RISC-V doesn't have the pitfalls of Sparc (register windows, branch delay slots), largely because we learned from that. It's in fact a very "boring" architecture. There's no one that expects it'll be hard to optimize for. There are at least 2 designs that have taped out in small runs and have high end performance.
I don't think anybody suggests Oracle couldn't make faster SPARC processors, it's just that development of SPARC ended almost 10 years ago. At the time SPARC was abandoned, it was very competitive.
Because today, getting a fast CPU out it isn't as much an engineering issue as it is about getting the investment for hiring a world-class fab.
The most promising RISC-V companies today have not set out to compete directly with Intel, AMD, Apple or Samsung, but are targeting a niche such as AI, HPC and/or high-end embedded such as automotive.
And you can bet that Qualcomm has RISC-V designs in-house, but only making ARM chips right now because ARM is where the market for smartphone and desktop SoCs is.
Once Google starts allowing RVA23 on Android / ChromeOS, the flood gates will open.
Fast, RVA23-compatible microarchitectures already exist. Everything high performance seems to be based on RVA23, which is the current application profile and comparable to ARMv9 and x86-64v4.
However, it takes time from microarchitecture to chips, and from chips to products on shelves.
The very first RVA23-compatible chips to show up will likely be the spacemiT K3 SoC, due in development boards April (i.e. next month).
More of them, more performant, such as a development board with the Tenstorrent Ascalon CPU in the form of the Atlantis SoC, which was tapped out recently, are coming this summer.
It is even possible such designs will show up in products aimed at the general public within the present year.
> Don't blame the ISA - blame the silicon implementations
That's true, but tautological.
The issue is that the RISC-V core is the easy part of the problem, and nobody seems to even be able to generate a chip that gets that right without weirdness and quirks.
The more fundamental technical problem is that things like the cache organization and DDR interface and PCI interface and ... cannot just be synthesized. They require analog/RF VLSI designers doing things like clock forwarding and signal integrity analysis. If you get them wrong, your performance tanks, and, so far, everybody has gotten them wrong in various ways.
The business problem is the fact that everybody wants to be the "performance" RISC-V vendor, but nobody wants to be the "embedded" RISC-V vendor. This is a problem because practically anybody who is willing to cough up for a "performance" processor is almost completely insensitive to any cost premium that ARM demands. The embedded space is hugely sensitive to cost, but nobody is willing to step into it because that requires that you do icky ecosystem things like marketing, software, debugging tools, inventory distribution, etc.
This leads to the US business problem which is the fact that everybody wants to be an IP vendor and nobody wants to ship a damn chip. Consequently, if I want actual RISC-V hardware, I'm stuck dealing with Chinese vendors of various levels of dodginess.
A lot of times the path to the highest performing CPU seems to be to optimize for power first, then speed, then repeat. That's because power and heat are a major design constraint that limits speed.
I first noticed this way back with the Pentium 4 "Netburst" architecture vs. the smaller x86 cores that became the ancestor of the Core architecture. Intel eventually ran into a wall with P4 and then branched high performance cores off those lower-power ones and that's what gave us the venerable Core architecture that made Intel the dominant CPU maker for over a decade.
I think the story is a bit more complicated. Core succeeded precisely because Intel had both the low-power experience with Pentium-M and the high-power experience with Netburst. The P4 architecture told them a lot about what was and wasn't viable and at what complexity. When you look at the successor generations from Core, what you see are a lot of more complex P4-like features being re-added, but with the benefits of improved microarch and fab processes. Obviously we will never know, but I don't think you would get to Haswell or Skylake in the form they were without the learning experience of the P4.
In comparison, I think Arm is actually a very strong cautionary tale that focusing on power will not get you to performance. Arm processors remained pretty poor performance until designers from other CPU families entirely (PowerPC and Intel) took it on at Apple and basically dragged Arm to the performance level they are today.
NetBurst was supposed to be the application of RISC principles to x86 taken to its extreme (ultra-long pipelines to reduce clock-to-clock delay, highest clock speed possible --- basically reducing work-per-clock and hoping that reduces complexity enough to increase clock speed to compensate.) The ALU was 16 bits, "double pumped" with the carry split between the two, which lead to 32-bit ALU operations that don't carry between the lower and upper halves actually finishing a clock cycle faster than those with a carry.
Core evolved from the Banis (Centrino) CPU core which was based on P3, not P4. Banias used the front-side bus from P4 but not the cores.
Banias was hyper optimized for power, the mantra was to get done quickly and go to sleep to save power. Somewhere along the line someone said "hey what happens if we don't go to sleep?" and Core was born.
There's the ARM video from LowSpecGamer, where they talk about how they forgot to connect power to the chip, and it was still executing code anyway. According to Steve Furber, the chip was accidentally being powered from the protection diodes alone. So ARM was incredibly power efficient from the very beginning.
Marcin is working with us on RISC-V enablement for Fedora and RHEL, he's well aware of the problem with current implementations. We're hopeful that this'll be pretty much resolved by the end of the year.
> AND the software with no architecture-specific optimisations
The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V. I do not believe this is a lack of software optimization issue.
We are well past the days where hand written assembly gives much benefit, and modern compilers like gcc and llvm do nearly identical work right up until it comes to instruction emissions (including determining where SIMD instructions could be placed).
Unless these chips have very very weird performance characteristics (like the weirdness around x86's lea instruction being used for arithmetic) there's just not going to be a lot of missed heuristics.
If you make a spec that the wider industry cannot effectively implement into quality products, it's the spec that's wrong. And that's true for anything - whether it's RISC-V, ipv6, Matter, USB-C and so on.
That's what makes writing specs hard - you need people who understand implementation challenges at the table, not dreaming architects and academics.
RISC-V lacks a bunch of really useful relatively easy to implement instructions and most extensions are truly optional so you can't rely on them. That's the problem if you let a bunch of academics turn your ISA into a paper mill.
In theory you can spend a lot of effort to make a flawed ISA perform, but it will be neither easy nor pretty e.g. real world Linux distros can't distribute optimised packages for every uarch from dual-issue in-order RV64GC to 8-wide OoO RV64 with all the bells and whistles. Only in (deeply) embedded systems can you retarget the toolchain and optimise for each damn architecture subset you encounter.
ARM was never a "speed demon"; it started out as a low power small-area core and clearly had more complexity and thought put into it than MIPS or RISC-V.
A couple of corrections (the blog-post is by a colleague, but I'm not speaking for Marcin! :))
First, we do have a recent 'binutils' build[1] with test-suites in 67 minutes (it was on Milk-V "Megrez") in the Fedora RISC-V build system. This is a non-trivial improvement over the 143-minute build time reported in the blog.
Second, the current fastest development machine is not Banana Pi BPI-F3. If we consider what is reasonably accessible today, it is SiFive "HiFive P550" (P550 for short) and an upcoming UltraRISC "DP1000", we have access to an eval board. And as noted elsewhere in this thread, in "several months" some RVA23-based machines should be available. (RVA23 == the latest ISA spec).
FWIW, our FOSDEM talk from earlier this year, "Fedora on RISC-V: state of the arch"[1], gives an overview of the hardware situation. It also has a couple of related poorman's benchmarks (an 'xz' compression test and a 'binutils' build without the test-suite on the above two boards -- that's what I could manage with the time I had).
Edit: Marcin's RISC-V test was done on StarFive "Vision Five 2". This small board has its strengths (upstreamed drivers), but it is not known for its speed!
It's a good solid reliable board, but over three years old at this point (in a fast-moving industry) and the maximum 8 GB RAM is quite challenging for some builds.
Binutils is fine, but on recent versions of gcc it wants to link four binaries at the same time, with each link using 4 GB RAM. I've found this fails on my 16 GB P550 Megrez with swap disabled, but works quickly and uses maybe 50 or 100 MB of swap if I enable it.
On the VisionFive 2 you'd need to use `-j1` (or `-j2` with swap enabled) which will nearly double or quadruple the build time.
Or use a better linker than `ld`.
At least the LLVM build system lets you set the number of parallel link jobs separately to the number of C/C++ jobs.
> I've found this fails on my 16 GB P550 Megrez with swap disabled but works quickly and uses maybe 50 or 100 MB of swap if I enable it.
I see, I don't have a Megrez at my desk, only in the build system. I only have P550 as my "workhorse".
PS: I made a typo above - the P550 I was referring to was the SiFive "HiFive Premier P550". But based on your HN profile text, you must've guessed it as much :)
Arm had 40 years to be where it is today. RISC-V is 15 years old. Some more patience is warranted.
Assuming they will keep their word, later this year Tenstorrent is supposed to ship their RVA23-based server development platform[1]. They announced[2] it at the last year's NA RISC-V Summit. Let's see.
The ball is in the court of hardware vendors to cook some high-end silicon.
Great point; I only know about MIPS legacy vaguely. As you imply, don't listen to the "hype-sters" but pay attention to what silicon is being produced.
This is why felix has been building the risc-v archlinux repositories[1] using the Milk-V Pioneer.
I think the ban of SOPHGO is part to blame for the slow development.[2] They had the most performant and interesting SOCs. I had a bunch of pre-orders for the Milk-V Oasis before it was cancelled. It was supposed to come out a while ago, using the SG2380, supposedly much more performant than the Milk-V Titan mentioned in the article (which still isn't out).
It was also SOPHGO's SOCs that powered the crazy cheap/performant/versatile Milk-V DUO boards. They have the ability to switch ARM/RISC-V architecture.
I won't pretend to understand the geo-politics or rulings.
What I do know is since the ban, all ongoing products featuring SOPHGO SOCs were cancelled, and I haven't seen any products featuring them since. The SOPHGO forums have also closed down.
The Milk-V Oasis would have had 16 cores (SG2380 w/ SiFive P670), it was replaced by the Milk-V Megrez with just 4 cores (SiFive P550) for around the same price. The new Milk-V Titan has only 8. We're slowly catching up, but the performance is now one or two years behind what it could've been.
The SG2380 would've been the first desktop ready RISC-V SOC at an affordable price. I think it's still the only SOC made that used the SiFive P670 core.
Is there a simple explanation why RISC-V software has to be built on a RISC-V system? Why is it so hard for compilers to compile for a different architecture? The general structure of the target architecture lives inside the compiler code and isn’t generated by introspecting the current system, right?
Cross compilation of entire distributions requires such distributions to be prepated for it. Which is not a case when you use OpenEmbedded/Yocto or Buildroot to build it. But it gets complicated with distributions which are built natively.
Fedora does not have a way to cross compile packages. The only cross compiler available in repositories is bare-metal one. You can use it to build firmware (EDK2, U-Boot) or Linux kernel. But nothing more.
Then there is the other problem: testing. What is a point of successful build if it does not work on target systems? Part of each Fedora build is running testsuite (if packaged software has any). You should not run it in QEMU so each cross-build would need to connect to target system, upload build artifacts and run tests. Overcomplicated.
Native builds allows to test is distribution ready for any kind of use. I use AArch64 desktop daily for almost a year now. But it is not "4core/16GB ram SBC" but rather "server-as-a-desktop" kind (80 cores, 128 GB ram, plenty of PCI-Express lanes). And I build software on, write blog posts, watch movies etc. And can emulate other Fedora architectures to do test builds.
Hardware architecture slow today, can be fast in the future. In 2013 building Qt4 for Fedora/AArch64 took days (we used software emulators). Now it takes 18 minutes.
Under specified build dependencies that use libraries/config on your host OS rather than the target system
You can solve this on a per language basis, but the C/C++ ecosystem is messy.
So people use VMs or real hardware of the target arch to not have to think about it
Old compilers tended to make it a compile-time switch which backends were included, probably because backends were "huge", so they were left out. (The insn lookup table in GCC took ages to generate and compile.) And of course all development environments running on Windows assumed x86 was the only architecture.
With LLVM existing, cross-compiling is not a problem anymore, but it means you can't run tests without an emulator. So it might just be easier to do it all on the target machine.
Cross building of possible, but it's rather useful to be able to test the software you just built... And often enough, tests take more resources than the build.
The cross-compiler part itself is easy, but getting all the build scripting of tens of thousands of Fedora packages to work perfectly for cross-compiling would be a lot of work.
There are lots of small issues (libraries or headers not being found, wrong libraries or headers being found, build scripts trying to run the binaries they just built, wrong compiler being used, wrong flags being used, etc.) when trying to cross-compile arbitrary software.
All fixable (cross-compiling entire distributions is a thing), but a lot of work and an extra maintenance burden.
Native builds are always a safer/more reliable path to take than cross-compiling, which usually requires solid native builds to be operational before the cross environment can be reliably trusted.
Its a bootstrapping chain of priority. Once a native build regime is set in stone, cross compiling harnesses can be built to exploit the beachhead.
I have saved many a failing projects budget and deadline by just putting the compiler onboard and obviating the hacky scaffolding usually required for reliable cross compiling at the beginning stages of a new architecture project, and I suspect this is the case here too ..
There was a Mastodon post some time back (~1y?) where someone realized that the fastest RISC-V hardware they could get was still slower than running it on QEMU.
That's not how it usually works :\
RISC-V is certainly spreading across niches, but performant computing is not one of them.
Edit: lol the author mentions the same! Perhaps they were the source of the original Mastodon post I'm thinking of.
The Milk-V Pioneer breaks that barrier, it's expensive though. And the risc-v architecture used is now old, the company that developed is was sanctioned by the US and is now dead.
I'd guess that the issue is running the `%install` and `%check` stages of the .spec file. The Python library rpy (to pull a random example from Marcin's PRs) runs rpy's pytest test suite and had to be modified to avoid running vector tests on RISC-V.
Obviously a solvable problem to split build and test but perhaps the time savings aren't worth the complexity.
Maybe the tests could be run with user-mode qemu instead of the whole thing running under qemu or on RISC-V hardware. Could possibly be more or less seamless with binfmt_misc being set up in the builders.
Near as I know, Fedora prefers native compilation for the builds.
Your question made me look up Arm's history in Fedora and came up on this 2012 LWN thread[1]. There's some discussion against cross-compilation already back then.
Yocto, which we use at work, manages it just fine to build a whole embedded Linux distro. So I don't see why Fedora couldn't make it work if they wanted. You could even scp over the test suites to run that on native systems if you wanted.
Are you sure you are comparing apples with apples here?
The fact that i686 is 14% faster than x86_64 is a little suspicious, because usually the same software runs _faster_ on x86_64 (despite the increased memory use) thanks to a larger register set, an optimized ABI, and more vector instructions.
Of course, if you are compiling an i686 binary on i686, and an x86_64 binary on x86_64, then the compilers aren't really doing the same work, since their output is different. I'm not a compiler expert, but I could imagine that compiling x86_64 binaries is intrinsically slower than for i686 for a variety of reasons. For example, x86_64 is mostly a superset of i686, so a compiler has way more instructions to consider, including potential optimizations using e.g. SIMD instructions that don't exist on i686 at all. Or a compiler might assume a larger instruction cache size, by default, and do more unrolling or inlining when compiling for x86_64. And so on.
In that case, compiling on x86_64 is slower not because the hardware is bad but because the compiler does more work. Perhaps something similar is happening on RISC-V.
This is article is being discussed on another forum where kernel build times are being compared for different RISC-V hardware. The conclusion there was that, if a BananaPi-F3 is taking 143 minutes to compile binutils, the SpacemiT K3 will buld it in 36 minutes using its X100 cores (half its cores).
That is the same as the time he quotes for the unidentified Aarch64 hardware.
Which makes this a pretty funny article.
I do not have a K3 to confrim. I am hoping to pick one up when it becomes more widely available next month.
What kind or ancient arm hardware are they using here?
On a related note, SoC companies needs to get their act together and start using the latest arm cores. Even the mid range cores of 1-2 years ago show a huge leap in performance:
>What kind or ancient arm hardware are they using here?
I think that's the point being made here. ARM in the 2000s was not known to be fast, now it is.
RISC-V being slow isn't an inherent characteristic of the ISA, it only tells you about the quality of its implementations. And said implementations will only improve if corporations are throwing capitals at it (see: Apple, Qualcomm, etc.)
i. llvm presentation can thrash caches if setup wrong (given the plethora of RISC-V fragmented versions, most compilers won't cover every vanity silicon.)
ii. gcc is also "slow" in general, but is predictable/reliable
iii. emulation is always slower than kvm in qemu
It may seem silly, but I'd try a gcc build with -O0 flag, and a toy unit test with -S to see if the ASM is actually foobar. One may have to force the -mtune=boom flag to narrow your search. Best regards =3
If I'm reading their chart right, they have barely half as much memory for their RISC-V machine compared to any of the others? I don't know enough to know whether it's actually bottlenecked by memory, but it's a bit odd to claim it's slower, give those numbers, and not say anything about it. I'd hope they ruled that out as the source of the discrepancy, but it's hard to tell without confirmation.
That sounds a lot less "RISC-V is slow" and more like "the most money I'm willing to spend on a RISC-V machine is low, but the more powerful ones may or not be as slow". I guess that doesn't make a particularly compelling headline.
The reason that he does not tell us what hardware he is using is because none of these times are for a single system building binutils. I think he is using a mix of systems and then doing some kind of averaging to tell us what a individual system would look like.
For some kind of hardware, all the systems they have would be the fastest that architecture offers, like with i686 I expect. While others are going to be a mix of old and new, like x86-64.
For RISC-V, the latest gen hardware is about as fast as the numbers he quotes for Aarch64. To be clear, the fastest ARM is still faster than the fastest RISC-V. But the numbers he quotes make no sense for something like a SpacemiT K3.
But if you are using RISC-V systems from two years ago in your build cluster, they will as he says be "Sloooow". But that shows how fast RISC-V is improving. It makes no sense to publish this article now.
At least, he should reveal what hardware he is talking about. His chart makes no sense (for most of the platforms).
Question: While you would want any official arch built natively, maybe an interim stage of emulated vm builds for wip/development/unsupported architectures would still be preferable in this case?
Comparing the tradeoffs:
* Packages disabled and not built because of long build times.
* Packages built and automated tests run on inaccurately emulated vms (NOT cross compiled). Users can test. It might be broken.
It's an experimental arch, maybe the build cluster could be experimental too?
FWIW checkout dockcross/linux-riscv32 and dockcross/linux-riscv64 if compilation itself is your problem.
I setup a CopyParty server on a headless RISC-V SBC and was a breeze. Just get the packets, do the thing, move on. Obviously depends on your need but maybe you're not using the right workflow and blame the tools instead.
Unrelated to the post's point but: Why does x86 build faster than x86_64? Presumably they used the same exact hardware, or at least the exact same number of cores and memory, yet the build time is more than 10% faster in x86. Is there some sort of overhead for x86_64 that I'm not seeing?
On benchmarks, for more precision details, I recommend the RISC-V Vector (RVV) benchmarks[1], maintained by Olaf Bernsten. He only covers the Vector stuff, but with great depth.
Just out of interest, why aren't they cross compiling RISC-V? I thought that was common practice when targeting lower performing hardware. It seems odd to me that the build cycle on the target hardware is a metric that matters.
Interesting that it's mandated as native - i'm really not sure the logic behind this (i've worked in the embedded world where such stuff is not only normal, but the only choice). I'll do some digging and see if I can find the thought process behind this.
OK, I'll bite. If this is a truly competitive core - I don't claim enough personal expertise to judge - does anyone fab and sell it? There should be a business case if it is.
This. While I doubt that there will be a good (whatever that means) desktop risc-v CPU anytime soon, I do think that it will eventually catch up in embedded systems and special applications. Maybe even high core count servers.
It just takes time, people who believe in it and tons of money. Will see where the journey goes, but I am a big risc-v believer
If the builds are slow, build accelerators can help a lot. Ccache would work for sure and there is also firebuild, that can accelerate the linker phase and many other tools in builds.
Hey! I get this is a throwaway account so you might not answer, but I really, really don't like opening an article and having the first thing I see in a thread be someone calling the author a slur. There are ways of expressing insult without bringing intellectual disabilities into the mix.
Don't blame the ISA - blame the silicon implementations AND the software with no architecture-specific optimisations.
RISC-V will get there, eventually.
I remember that ARM started as a speed demon with conscious power consumption, then was surpassed by x86s and PPCs on desktops and moved to embedded, where it shone by being very frugal with power, only to now be leaving the embedded space with implementations optimised for speed more than power.
In some cases RISC-V ISA spec is definitely the one to blame:
1) https://github.com/llvm/llvm-project/issues/150263
2) https://github.com/llvm/llvm-project/issues/141488
Another example is hard-coded 4 KiB page size which effectively kneecaps ISA when compared against ARM.
All of those things are solved with modern extensions. It's like comparing pre-MMX x86 code with modern x86. Misaligned loads and stores are Zicclsm, bit manipulation is Zb[abcs], atomic memory operations are made mandatory in Ziccamoa.
All of these extensions are mandatory in the RVA22 and RVA23 profiles and so will be implemented on any up to date RISC-V core. It's definitely worth setting your compiler target appropriately before making comparisons.
109 replies →
Regarding misaligned reads, IIRC only x86 hides non-aligned memory access. It's still slower than aligned reads. Other processors just fault, so it would make sense to do the same on riscv.
The problem is decades of software being written on a chip that from the outside appears not to care.
10 replies →
Also the bit manipulation extension wasn't part of the core. So things like bit rotation is slow for no good reason, if you want portable code. Why? Who knows.
36 replies →
Unaligned load/store is a horrible feature to implement.
Page size can be easily extended down the line without breaking changes.
The first one is common across many architectures, including ARM, and the second is just LLVM developers not understanding how cmpxchg works
> 1) https://github.com/llvm/llvm-project/issues/150263
Huh? They have no idea what they are doing. If data is unaligned, the solution is memcpy, not compiler optimizations, also their hack of 17 loads is buffer overflow. Also not ISA spec problem.
> RISC-V will get there, eventually.
Not trolling: I legitimately don't see why this is assumed to be true. It is one of those things that is true only once it has been achieved. Otherwise we would be able to create super high performance Sparc or SuperH processors, and we don't.
As you note, Arm once was fast, then slow, then fast. RISC-V has never actually been fast. It has enabled surprisingly good implementations by small numbers of people, but competing at the high end (mobile, desktop or server) it is not.
I think the bigger question is does RISC-V need to be fast? Who wants to make it fast?
I'm a chip designer and I see people using RISC-V as small processor cores for things like PCIE link training or various bookkeeping tasks. These don't need to be fast, they need to be small and low power which means they will be relatively slow.
Most people on tech review sites only care about desktop / laptop / server performance. They may know about some of the ARM Cortex A series CPUs that have MMUs and can run desktop or smartphone Linux versions.
They generally don't care about the ARM Cortex M or R versions for embedded and real time use. Those are the areas where you don't need high performance and where RISC-V is already replacing ARM.
EDIT:
I'll add that there are companies that COULD make a fast RISC-V implementation.
Intel, AMD, Apple, Qualcomm, or Nvidia could redirect their existing teams to design a high performance RISC-V CPU. But why should they? They are heavily invested in their existing x86 and ARM CPU lines. Amazon and Google are using licensed ARM cores in their server CPUs.
What is the incentive for any of them to make a high performance RISC-V CPU? The only reason I can think of is that Softbank keeps raising ARM licensing costs and it gets high enough that it is more profitable to hire a team and design your own RISC-V CPU.
6 replies →
RISC-V doesn't have the pitfalls of Sparc (register windows, branch delay slots), largely because we learned from that. It's in fact a very "boring" architecture. There's no one that expects it'll be hard to optimize for. There are at least 2 designs that have taped out in small runs and have high end performance.
49 replies →
I don't think anybody suggests Oracle couldn't make faster SPARC processors, it's just that development of SPARC ended almost 10 years ago. At the time SPARC was abandoned, it was very competitive.
4 replies →
Because today, getting a fast CPU out it isn't as much an engineering issue as it is about getting the investment for hiring a world-class fab.
The most promising RISC-V companies today have not set out to compete directly with Intel, AMD, Apple or Samsung, but are targeting a niche such as AI, HPC and/or high-end embedded such as automotive.
And you can bet that Qualcomm has RISC-V designs in-house, but only making ARM chips right now because ARM is where the market for smartphone and desktop SoCs is. Once Google starts allowing RVA23 on Android / ChromeOS, the flood gates will open.
1 reply →
Fast, RVA23-compatible microarchitectures already exist. Everything high performance seems to be based on RVA23, which is the current application profile and comparable to ARMv9 and x86-64v4.
However, it takes time from microarchitecture to chips, and from chips to products on shelves.
The very first RVA23-compatible chips to show up will likely be the spacemiT K3 SoC, due in development boards April (i.e. next month).
More of them, more performant, such as a development board with the Tenstorrent Ascalon CPU in the form of the Atlantis SoC, which was tapped out recently, are coming this summer.
It is even possible such designs will show up in products aimed at the general public within the present year.
> Don't blame the ISA - blame the silicon implementations
That's true, but tautological.
The issue is that the RISC-V core is the easy part of the problem, and nobody seems to even be able to generate a chip that gets that right without weirdness and quirks.
The more fundamental technical problem is that things like the cache organization and DDR interface and PCI interface and ... cannot just be synthesized. They require analog/RF VLSI designers doing things like clock forwarding and signal integrity analysis. If you get them wrong, your performance tanks, and, so far, everybody has gotten them wrong in various ways.
The business problem is the fact that everybody wants to be the "performance" RISC-V vendor, but nobody wants to be the "embedded" RISC-V vendor. This is a problem because practically anybody who is willing to cough up for a "performance" processor is almost completely insensitive to any cost premium that ARM demands. The embedded space is hugely sensitive to cost, but nobody is willing to step into it because that requires that you do icky ecosystem things like marketing, software, debugging tools, inventory distribution, etc.
This leads to the US business problem which is the fact that everybody wants to be an IP vendor and nobody wants to ship a damn chip. Consequently, if I want actual RISC-V hardware, I'm stuck dealing with Chinese vendors of various levels of dodginess.
A pattern I've noticed for a very long time:
A lot of times the path to the highest performing CPU seems to be to optimize for power first, then speed, then repeat. That's because power and heat are a major design constraint that limits speed.
I first noticed this way back with the Pentium 4 "Netburst" architecture vs. the smaller x86 cores that became the ancestor of the Core architecture. Intel eventually ran into a wall with P4 and then branched high performance cores off those lower-power ones and that's what gave us the venerable Core architecture that made Intel the dominant CPU maker for over a decade.
ARM's history is another example.
I think the story is a bit more complicated. Core succeeded precisely because Intel had both the low-power experience with Pentium-M and the high-power experience with Netburst. The P4 architecture told them a lot about what was and wasn't viable and at what complexity. When you look at the successor generations from Core, what you see are a lot of more complex P4-like features being re-added, but with the benefits of improved microarch and fab processes. Obviously we will never know, but I don't think you would get to Haswell or Skylake in the form they were without the learning experience of the P4.
In comparison, I think Arm is actually a very strong cautionary tale that focusing on power will not get you to performance. Arm processors remained pretty poor performance until designers from other CPU families entirely (PowerPC and Intel) took it on at Apple and basically dragged Arm to the performance level they are today.
2 replies →
NetBurst was supposed to be the application of RISC principles to x86 taken to its extreme (ultra-long pipelines to reduce clock-to-clock delay, highest clock speed possible --- basically reducing work-per-clock and hoping that reduces complexity enough to increase clock speed to compensate.) The ALU was 16 bits, "double pumped" with the carry split between the two, which lead to 32-bit ALU operations that don't carry between the lower and upper halves actually finishing a clock cycle faster than those with a carry.
https://stackoverflow.com/questions/45066299/was-there-a-p4-...
Core evolved from the Banis (Centrino) CPU core which was based on P3, not P4. Banias used the front-side bus from P4 but not the cores.
Banias was hyper optimized for power, the mantra was to get done quickly and go to sleep to save power. Somewhere along the line someone said "hey what happens if we don't go to sleep?" and Core was born.
I don’t have a micro architecture background so I apologize if this is obvious — What do power and speed mean in this context?
4 replies →
Parallels to code design, where optimizing data or code size can end up having fantastic performance benefits (sometimes).
There's the ARM video from LowSpecGamer, where they talk about how they forgot to connect power to the chip, and it was still executing code anyway. According to Steve Furber, the chip was accidentally being powered from the protection diodes alone. So ARM was incredibly power efficient from the very beginning.
Marcin is working with us on RISC-V enablement for Fedora and RHEL, he's well aware of the problem with current implementations. We're hopeful that this'll be pretty much resolved by the end of the year.
If he expects it to be resolved by the end of the year (and I agree it likely will be), why is he writing a post like this?
Is this because Fedora 44 is going to beta?
1 reply →
> AND the software with no architecture-specific optimisations
The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V. I do not believe this is a lack of software optimization issue.
We are well past the days where hand written assembly gives much benefit, and modern compilers like gcc and llvm do nearly identical work right up until it comes to instruction emissions (including determining where SIMD instructions could be placed).
Unless these chips have very very weird performance characteristics (like the weirdness around x86's lea instruction being used for arithmetic) there's just not going to be a lot of missed heuristics.
> The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V.
There's no carry bit, and no widening multiply(or MAC)
1 reply →
[flagged]
8 replies →
IF you care to read the article, they indeed do not blame the architecture but the available silicon implementations.
I did read it. A Banana Pi is not the fastest developer platform. The title is misleading.
BTW, it's quite impressive how the s390x is so fast per core compared to the others. I mean, of course it's fast - we all knew that.
And don't let IBM legal see this can be considered a published benchmark, because they are very shy about s390x performance numbers.
16 replies →
I keep checking in on Tenstorrent every few months thinking Keller is going to rock our world... losing hope.
At this point the most likely place for truly competitive RISC-V to appear is China.
14 replies →
But they didn't reflect that in a title like "current RISC-V silicon Is Sloooow" ...
Then how do you justify the title?
If you make a spec that the wider industry cannot effectively implement into quality products, it's the spec that's wrong. And that's true for anything - whether it's RISC-V, ipv6, Matter, USB-C and so on.
That's what makes writing specs hard - you need people who understand implementation challenges at the table, not dreaming architects and academics.
RISC-V lacks a bunch of really useful relatively easy to implement instructions and most extensions are truly optional so you can't rely on them. That's the problem if you let a bunch of academics turn your ISA into a paper mill.
In theory you can spend a lot of effort to make a flawed ISA perform, but it will be neither easy nor pretty e.g. real world Linux distros can't distribute optimised packages for every uarch from dual-issue in-order RV64GC to 8-wide OoO RV64 with all the bells and whistles. Only in (deeply) embedded systems can you retarget the toolchain and optimise for each damn architecture subset you encounter.
ARM was never a "speed demon"; it started out as a low power small-area core and clearly had more complexity and thought put into it than MIPS or RISC-V.
Over a decade ago: https://news.ycombinator.com/item?id=8235120
RISC-V will get there, eventually.
Strong doubt. Those of us who were around in the 90s might remember how much hype there was with MIPS.
I don’t think you remember, But the first Archimedes smoked the just-launched Compaq 386s with a dedicated 387 coprocessor.
It was not designed to be one, but it ended up being surprisingly fast.
A couple of corrections (the blog-post is by a colleague, but I'm not speaking for Marcin! :))
First, we do have a recent 'binutils' build[1] with test-suites in 67 minutes (it was on Milk-V "Megrez") in the Fedora RISC-V build system. This is a non-trivial improvement over the 143-minute build time reported in the blog.
Second, the current fastest development machine is not Banana Pi BPI-F3. If we consider what is reasonably accessible today, it is SiFive "HiFive P550" (P550 for short) and an upcoming UltraRISC "DP1000", we have access to an eval board. And as noted elsewhere in this thread, in "several months" some RVA23-based machines should be available. (RVA23 == the latest ISA spec).
FWIW, our FOSDEM talk from earlier this year, "Fedora on RISC-V: state of the arch"[1], gives an overview of the hardware situation. It also has a couple of related poorman's benchmarks (an 'xz' compression test and a 'binutils' build without the test-suite on the above two boards -- that's what I could manage with the time I had).
Edit: Marcin's RISC-V test was done on StarFive "Vision Five 2". This small board has its strengths (upstreamed drivers), but it is not known for its speed!
[1] https://riscv-koji.fedoraproject.org/koji/taskinfo?taskID=91...
[2] Slides: https://fosdem.org/2026/events/attachments/SQGLW7-fedora-on-...
> VisionFive 2
It's a good solid reliable board, but over three years old at this point (in a fast-moving industry) and the maximum 8 GB RAM is quite challenging for some builds.
Binutils is fine, but on recent versions of gcc it wants to link four binaries at the same time, with each link using 4 GB RAM. I've found this fails on my 16 GB P550 Megrez with swap disabled, but works quickly and uses maybe 50 or 100 MB of swap if I enable it.
On the VisionFive 2 you'd need to use `-j1` (or `-j2` with swap enabled) which will nearly double or quadruple the build time.
Or use a better linker than `ld`.
At least the LLVM build system lets you set the number of parallel link jobs separately to the number of C/C++ jobs.
> I've found this fails on my 16 GB P550 Megrez with swap disabled but works quickly and uses maybe 50 or 100 MB of swap if I enable it.
I see, I don't have a Megrez at my desk, only in the build system. I only have P550 as my "workhorse".
PS: I made a typo above - the P550 I was referring to was the SiFive "HiFive Premier P550". But based on your HN profile text, you must've guessed it as much :)
Arm had 40 years to be where it is today. RISC-V is 15 years old. Some more patience is warranted.
Assuming they will keep their word, later this year Tenstorrent is supposed to ship their RVA23-based server development platform[1]. They announced[2] it at the last year's NA RISC-V Summit. Let's see.
The ball is in the court of hardware vendors to cook some high-end silicon.
[1] https://tenstorrent.com/ip/risc-v-cpu
[2] https://static.sched.com/hosted_files/riscvsummit2025/e2/Unl...
MIPS, which RISC-V is closely modeled after, is also roughly 4 decades old and was massively hyped in the early 90s as well.
Great point; I only know about MIPS legacy vaguely. As you imply, don't listen to the "hype-sters" but pay attention to what silicon is being produced.
Aarch64 is just 15 years old, and shares pretty much nothing with 32 bit arms apart from the name.
This is why felix has been building the risc-v archlinux repositories[1] using the Milk-V Pioneer.
I think the ban of SOPHGO is part to blame for the slow development.[2] They had the most performant and interesting SOCs. I had a bunch of pre-orders for the Milk-V Oasis before it was cancelled. It was supposed to come out a while ago, using the SG2380, supposedly much more performant than the Milk-V Titan mentioned in the article (which still isn't out).
It was also SOPHGO's SOCs that powered the crazy cheap/performant/versatile Milk-V DUO boards. They have the ability to switch ARM/RISC-V architecture.
[1]: https://archriscv.felixc.at/
[2]: https://www.tomshardware.com/tech-industry/artificial-intell...
Can you articulate why you think this ban impacted anything and what you think the ban applies to?
I won't pretend to understand the geo-politics or rulings.
What I do know is since the ban, all ongoing products featuring SOPHGO SOCs were cancelled, and I haven't seen any products featuring them since. The SOPHGO forums have also closed down.
The Milk-V Oasis would have had 16 cores (SG2380 w/ SiFive P670), it was replaced by the Milk-V Megrez with just 4 cores (SiFive P550) for around the same price. The new Milk-V Titan has only 8. We're slowly catching up, but the performance is now one or two years behind what it could've been.
The SG2380 would've been the first desktop ready RISC-V SOC at an affordable price. I think it's still the only SOC made that used the SiFive P670 core.
Is there a simple explanation why RISC-V software has to be built on a RISC-V system? Why is it so hard for compilers to compile for a different architecture? The general structure of the target architecture lives inside the compiler code and isn’t generated by introspecting the current system, right?
Cross compilation of entire distributions requires such distributions to be prepated for it. Which is not a case when you use OpenEmbedded/Yocto or Buildroot to build it. But it gets complicated with distributions which are built natively.
Fedora does not have a way to cross compile packages. The only cross compiler available in repositories is bare-metal one. You can use it to build firmware (EDK2, U-Boot) or Linux kernel. But nothing more.
Then there is the other problem: testing. What is a point of successful build if it does not work on target systems? Part of each Fedora build is running testsuite (if packaged software has any). You should not run it in QEMU so each cross-build would need to connect to target system, upload build artifacts and run tests. Overcomplicated.
Native builds allows to test is distribution ready for any kind of use. I use AArch64 desktop daily for almost a year now. But it is not "4core/16GB ram SBC" but rather "server-as-a-desktop" kind (80 cores, 128 GB ram, plenty of PCI-Express lanes). And I build software on, write blog posts, watch movies etc. And can emulate other Fedora architectures to do test builds.
Hardware architecture slow today, can be fast in the future. In 2013 building Qt4 for Fedora/AArch64 took days (we used software emulators). Now it takes 18 minutes.
Under specified build dependencies that use libraries/config on your host OS rather than the target system
You can solve this on a per language basis, but the C/C++ ecosystem is messy. So people use VMs or real hardware of the target arch to not have to think about it
Old compilers tended to make it a compile-time switch which backends were included, probably because backends were "huge", so they were left out. (The insn lookup table in GCC took ages to generate and compile.) And of course all development environments running on Windows assumed x86 was the only architecture.
With LLVM existing, cross-compiling is not a problem anymore, but it means you can't run tests without an emulator. So it might just be easier to do it all on the target machine.
Cross building of possible, but it's rather useful to be able to test the software you just built... And often enough, tests take more resources than the build.
The cross-compiler part itself is easy, but getting all the build scripting of tens of thousands of Fedora packages to work perfectly for cross-compiling would be a lot of work.
There are lots of small issues (libraries or headers not being found, wrong libraries or headers being found, build scripts trying to run the binaries they just built, wrong compiler being used, wrong flags being used, etc.) when trying to cross-compile arbitrary software.
All fixable (cross-compiling entire distributions is a thing), but a lot of work and an extra maintenance burden.
Native builds are always a safer/more reliable path to take than cross-compiling, which usually requires solid native builds to be operational before the cross environment can be reliably trusted.
Its a bootstrapping chain of priority. Once a native build regime is set in stone, cross compiling harnesses can be built to exploit the beachhead.
I have saved many a failing projects budget and deadline by just putting the compiler onboard and obviating the hacky scaffolding usually required for reliable cross compiling at the beginning stages of a new architecture project, and I suspect this is the case here too ..
Or they could fix cross compilation and then compile it on a normal x86_64 server
Fixing cross compilation is a huge undertaking. So much software needs to be patched to be properly cross-compilable.
There was a Mastodon post some time back (~1y?) where someone realized that the fastest RISC-V hardware they could get was still slower than running it on QEMU.
That's not how it usually works :\
RISC-V is certainly spreading across niches, but performant computing is not one of them.
Edit: lol the author mentions the same! Perhaps they were the source of the original Mastodon post I'm thinking of.
The Milk-V Pioneer breaks that barrier, it's expensive though. And the risc-v architecture used is now old, the company that developed is was sanctioned by the US and is now dead.
Is cross compilation out of the question?
I'd guess that the issue is running the `%install` and `%check` stages of the .spec file. The Python library rpy (to pull a random example from Marcin's PRs) runs rpy's pytest test suite and had to be modified to avoid running vector tests on RISC-V.
Obviously a solvable problem to split build and test but perhaps the time savings aren't worth the complexity.
https://src.fedoraproject.org/rpms/rpy/pull-request/4#reques...
Maybe the tests could be run with user-mode qemu instead of the whole thing running under qemu or on RISC-V hardware. Could possibly be more or less seamless with binfmt_misc being set up in the builders.
Near as I know, Fedora prefers native compilation for the builds.
Your question made me look up Arm's history in Fedora and came up on this 2012 LWN thread[1]. There's some discussion against cross-compilation already back then.
[1] https://lwn.net/Articles/487622/
It's usually an enormous pain to set up. QEMU is probably the best option.
T2 manages to do it
https://t2linux.com/
Yocto, which we use at work, manages it just fine to build a whole embedded Linux distro. So I don't see why Fedora couldn't make it work if they wanted. You could even scp over the test suites to run that on native systems if you wanted.
4 replies →
Maybe there are issues I'm not aware of but using dockcross has made cross-compilation quite easy in my experience.
https://github.com/dockcross/dockcross
1 reply →
Depends on the language, it's pretty trivial with Go.
2 replies →
Are you sure you are comparing apples with apples here?
The fact that i686 is 14% faster than x86_64 is a little suspicious, because usually the same software runs _faster_ on x86_64 (despite the increased memory use) thanks to a larger register set, an optimized ABI, and more vector instructions.
Of course, if you are compiling an i686 binary on i686, and an x86_64 binary on x86_64, then the compilers aren't really doing the same work, since their output is different. I'm not a compiler expert, but I could imagine that compiling x86_64 binaries is intrinsically slower than for i686 for a variety of reasons. For example, x86_64 is mostly a superset of i686, so a compiler has way more instructions to consider, including potential optimizations using e.g. SIMD instructions that don't exist on i686 at all. Or a compiler might assume a larger instruction cache size, by default, and do more unrolling or inlining when compiling for x86_64. And so on.
In that case, compiling on x86_64 is slower not because the hardware is bad but because the compiler does more work. Perhaps something similar is happening on RISC-V.
It isn't crazy uncommon to see i686 be faster - usually it means you're memory bandwidth bound.
But yeah, it may mean the benchmark is not representative.
The x86-64 build runs about 50% more linker tests than the i686 build.
Does that page even say which RISC-V CPUs are being used that are slow? I couldn't see it, which seems a bit of pointless complaining.
> RISC-V builders have four or eight cores with 8, 16 or 32 GB of RAM (depending on a board).
Which boards are used specifically should not matter much. There's not much available.
Except for the Milk-V Pioneer, which has 64 cores and 128GB ram. But that's an older architecture and it's expensive.
This is article is being discussed on another forum where kernel build times are being compared for different RISC-V hardware. The conclusion there was that, if a BananaPi-F3 is taking 143 minutes to compile binutils, the SpacemiT K3 will buld it in 36 minutes using its X100 cores (half its cores).
That is the same as the time he quotes for the unidentified Aarch64 hardware.
Which makes this a pretty funny article.
I do not have a K3 to confrim. I am hoping to pick one up when it becomes more widely available next month.
> Random mumblings of ARM developer ... RISC-V is sloooow
Old news. See also:
> Random mumblings of x86_64 developer ... ARM is sloooow
What kind or ancient arm hardware are they using here?
On a related note, SoC companies needs to get their act together and start using the latest arm cores. Even the mid range cores of 1-2 years ago show a huge leap in performance:
https://sbc.compare/56-raspberry-pi-500-plus-16gb/101-radxa-...
>What kind or ancient arm hardware are they using here?
I think that's the point being made here. ARM in the 2000s was not known to be fast, now it is.
RISC-V being slow isn't an inherent characteristic of the ISA, it only tells you about the quality of its implementations. And said implementations will only improve if corporations are throwing capitals at it (see: Apple, Qualcomm, etc.)
2 replies →
Any new hardware lags in compiler optimizations.
i. llvm presentation can thrash caches if setup wrong (given the plethora of RISC-V fragmented versions, most compilers won't cover every vanity silicon.)
ii. gcc is also "slow" in general, but is predictable/reliable
iii. emulation is always slower than kvm in qemu
It may seem silly, but I'd try a gcc build with -O0 flag, and a toy unit test with -S to see if the ASM is actually foobar. One may have to force the -mtune=boom flag to narrow your search. Best regards =3
If I'm reading their chart right, they have barely half as much memory for their RISC-V machine compared to any of the others? I don't know enough to know whether it's actually bottlenecked by memory, but it's a bit odd to claim it's slower, give those numbers, and not say anything about it. I'd hope they ruled that out as the source of the discrepancy, but it's hard to tell without confirmation.
I think it's mentioned clearly in the article.
> RISC-V builders have four or eight cores with 8, 16 or 32 GB of RAM (depending on a board)
> The UltraRISC UR-DP1000 SoC, present on the Milk-V Titan motherboard should improve situation a bit (and can have 64 GB ram).
RISC-V SOCs just typically don't support much ram. With the exception of the SG2042 which can take 128GB, but it's expensive, buggy and now old.
So I am sure it's a combination of low ram and low clockspeeds.
That sounds a lot less "RISC-V is slow" and more like "the most money I'm willing to spend on a RISC-V machine is low, but the more powerful ones may or not be as slow". I guess that doesn't make a particularly compelling headline.
I am going to make a wild guess here.
The reason that he does not tell us what hardware he is using is because none of these times are for a single system building binutils. I think he is using a mix of systems and then doing some kind of averaging to tell us what a individual system would look like.
For some kind of hardware, all the systems they have would be the fastest that architecture offers, like with i686 I expect. While others are going to be a mix of old and new, like x86-64.
For RISC-V, the latest gen hardware is about as fast as the numbers he quotes for Aarch64. To be clear, the fastest ARM is still faster than the fastest RISC-V. But the numbers he quotes make no sense for something like a SpacemiT K3.
But if you are using RISC-V systems from two years ago in your build cluster, they will as he says be "Sloooow". But that shows how fast RISC-V is improving. It makes no sense to publish this article now.
At least, he should reveal what hardware he is talking about. His chart makes no sense (for most of the platforms).
I updated blog post after reading comments from Matrix/Slack/Phoronix/HN/Lobster/etc. places.
- mentioned which board had 143 minutes, added info about time on Milk-V Megrez board
- added section 'what we need hw-wise for being in fedora'
- added link to my desktop post to point that it is aarch64, not x86-64
- wording around qemu to show that I use it locally only
> ... I can build the “llvm15” package in about 4 hours. Compare that to 10.5 hours on a Banana Pi BPI-F3 builder (it may be quicker on a P550 one).
That's....slow. What a huge pile of bloat.
Thanks for the post!
Question: While you would want any official arch built natively, maybe an interim stage of emulated vm builds for wip/development/unsupported architectures would still be preferable in this case?
Comparing the tradeoffs: * Packages disabled and not built because of long build times. * Packages built and automated tests run on inaccurately emulated vms (NOT cross compiled). Users can test. It might be broken.
It's an experimental arch, maybe the build cluster could be experimental too?
The current hardware used is self-hosting mini-server grade, and certainly not on the latest silicon process. "Slow" is expected.
It is not the ISA, but the implementations and those horrible SDKs which needs to be adjusted for RISC-V (actually any new ISA).
RISC-V needs extremely performant implementations, that on the best silicon process, until then RISC-V _will be_ "slow".
Not to mention, RISC-V is 'standard ISA': assembly writted software is more than appropriate in many cases.
FWIW checkout dockcross/linux-riscv32 and dockcross/linux-riscv64 if compilation itself is your problem.
I setup a CopyParty server on a headless RISC-V SBC and was a breeze. Just get the packets, do the thing, move on. Obviously depends on your need but maybe you're not using the right workflow and blame the tools instead.
Unrelated to the post's point but: Why does x86 build faster than x86_64? Presumably they used the same exact hardware, or at least the exact same number of cores and memory, yet the build time is more than 10% faster in x86. Is there some sort of overhead for x86_64 that I'm not seeing?
There's zero mention of hardware specs or cost beyond architecture and core counts... What is the purpose of this post?
Anyway, it's hardly surprising that a young ISA with not a 1/1000th of the investment of x86 or ARM has slower chips than them x)
On benchmarks, for more precision details, I recommend the RISC-V Vector (RVV) benchmarks[1], maintained by Olaf Bernsten. He only covers the Vector stuff, but with great depth.
[1] https://camel-cdr.github.io/rvv-bench-results/
Just out of interest, why aren't they cross compiling RISC-V? I thought that was common practice when targeting lower performing hardware. It seems odd to me that the build cycle on the target hardware is a metric that matters.
Please skim the thread :) We've already discussed it twice. Fedora "mandates" native builds.
Build time on target hardware matters when you're re-building an entire Linux distribution (25000+ packages) every six months.
I failed to find this on my skim, my bad :(
Interesting that it's mandated as native - i'm really not sure the logic behind this (i've worked in the embedded world where such stuff is not only normal, but the only choice). I'll do some digging and see if I can find the thought process behind this.
there are projects for making high performance RISC-V chips like this one https://github.com/OpenXiangShan/XiangShan
OK, I'll bite. If this is a truly competitive core - I don't claim enough personal expertise to judge - does anyone fab and sell it? There should be a business case if it is.
If I remember correctly,it was taped out by some company as some embedded core in a GPU?
I guess that may be the true use case for 'Open-Source' cores.
That being said, the advertised SPEC2007 scores are close to a M1 in IPC.
Yeah it's a few years behind ARM, but not that many. Imagine trying to compile this on ARM 10 years ago. It would be similarly painful.
> Imagine trying to compile this on ARM 10 years ago
Cortex A57 is 14 years old and is significantly faster than the 9 year old Cortex A55 these RISC-V cores are being compared against.
So yes it's many years behind. Many, many years.
SpacemiT K3 is on par with Rockchip RK3588. So, about 4 years behind ARM.
Tenstorrent Atlantis (first Ascalon silicon) should ship in Q2/Q3 and be twice as fast. About as fast as Ryzen5. So, about 5 years behind AMD.
But even the K3 has faster AI than Apple Silicon or Qualcomm X Elite.
Current trend-lines suggest ARM64 and RISC-V performance parity before 2030.
3 replies →
This. While I doubt that there will be a good (whatever that means) desktop risc-v CPU anytime soon, I do think that it will eventually catch up in embedded systems and special applications. Maybe even high core count servers.
It just takes time, people who believe in it and tons of money. Will see where the journey goes, but I am a big risc-v believer
Why? They have yet to show anything to believe in except perhaps the embedded space.
1 reply →
Couldn’t be caused by a slower compiler? Fe. What would be a difference when cross compiling same code to aarch64 vs risc-v?
Is it slow because of the inherent design or because it's recent and not as optimised as x86 or arm ?
Why not cross compile in such case on better hardware? Then run tests on the native one.
I don't care as long as it keeps my soldering iron hot.
If the builds are slow, build accelerators can help a lot. Ccache would work for sure and there is also firebuild, that can accelerate the linker phase and many other tools in builds.
Why is it slow? I thought we have Rivos chips
They haven't produced any chips.
Rivos was acquired by Meta last year.
Windows is still much slower.
[dead]
[dead]
[flagged]
Hey! I get this is a throwaway account so you might not answer, but I really, really don't like opening an article and having the first thing I see in a thread be someone calling the author a slur. There are ways of expressing insult without bringing intellectual disabilities into the mix.
For future readers: throwaway27448's comment used to say something completely different, featuring the r-slur, and then immediately edited.
2 replies →
[flagged]
Ulrich Drepper, Lennart Poettering, this clown. Red Hat seems to have a skill of hiring savants with high technical and low social aptitude.
[flagged]
Is it RISC-V or bloated software full of layered abstractions?