Comment by camel-cdr

2 years ago

Link to the open-source processor implementation: https://github.com/OpenXiangShan/XiangShan/tree/nanhu

You can download and simulate it on regular hardware.

I ran a few micro benchmarks it XiangShanV2 (Nanhu, the one in the laptop) and XiangShanV3 the next generation of their implementation:

    integer micro benchmark from the XiangShan repo:
                                 Zen1 1600x     XiangShanV2     XiangShanV3
    Quick sort:                16833 cycles    11122 cycles    10582 cycles
    Queen placement:           56606 cycles    59712 cycles    49912 cycles
    Brainf**k interpreter:    132821 cycles   113686 cycles    52676 cycles
    Fibonacci number:           7473 cycles     5999 cycles     2763 cycles
    Eratosthenes sieve:         5364 cycles     3140 cycles     2037 cycles
    A* 15-puzzle search:       20459 cycles    14626 cycles    11018 cycles
    Dinic's maxflow algorithm: 12357 cycles    11184 cycles     6174 cycles
    Lzip compression:           7140 cycles     5932 cycles     2289 cycles
    Suffix sort:               16316 cycles    14967 cycles    11256 cycles
    MD5 digest:                 5882 cycles     3793 cycles     1997 cycles
    Total:                    281251 cycles   244161 cycles   150704 cycles
    
    scalar fp32 mandelbrot 64x64 with 64 iterations:
    Zen1 1600x:  1264882 cycles
    XiangShanV2: 1361856 cycles
    XiangShanV3: 1011363 cycles

The Ryzen 1600X is my current desktop and the computer I ran the RTL simulation on.

At the same clock frequency XiangShanV2 is quite competitive with the Zen1 CPU, however it doesn't implement the RISC-V vector extension, so will be a lot slower in any SIMD workloads. The RuyiBook is supposed to clock at 2.5GHz, but there were slides saying it can go up to 2.8GHz, while the 1600X can go up to 3.7GHz.

XiangShanV3 is a lot faster, and does implement the RISC-V vector extension, as well the hypervisor extension. They also target a 3GHz frequency.

Here is a recent presentation of XiangShanV2 micro architectural implementation details: https://raw.githubusercontent.com/OpenXiangShan/XiangShan-do...

There were a few talks at RISC-V Summit Chine regarding XiangShanV3 implementation details. Here is a recording, look at the second Day 2, if the clunky interface works for you: https://www.c114.com.cn/live/t850.html

They also present at this years hotchips in a few days.

Without SIMD is a no-go for most devs.

  • Yeah, it won't be usable for people who want to optimize SIMD support for RISC-V. However it does have a few extensions we haven't seen in hardware yet, like all scalar cryptography extensions. And it should perform quite similar to the Raspberry Pi 5.