Comment by camel-cdr
1 month ago
Ok, let's test it then!
For testing, I use a custom qemu plugin to calculate the dynamic instruction count, dynamic uop count, and dynamic instruction size. Every instruction with multiple register writebacks was counted as one uop per writeback, and to make the results more comparable, SIMD was disabled.
I used this setup to run self-compiling single-file versions of chibicc (assembling) and tinycc (generating object file), which are small C compilers of 9K and 24K LOC respectively. Both compilers were cross-compiled using clang-22 and were benchmarked cross-compiling themselves to x86.
Let's look at the impact of -ftrapv first. In chibicc O3/O2/Os the dynamic upos increased due to -ftrapv for RISC-V by 5.3%/5.1%/6.7%, and for ARM by 5.1%/5.0%/6.4%. Interestingly, in tinycc it only increased for RISC-V by 1.6%/1.0%/1.0%, while ARM increased slightly more with 1.6%/2.0%/1.3%.
In terms of dynamic instruction count, ARM needed to execute 6%/15% fewer instructions than RISC-V for chibicc/tinycc. Looking at the uops, RISC-V needs to execute 6% more uops in tinycc, but ARM needs to execute 0.5% more uops with chibicc. The dynamic instruction size, which estimates the pressure on icache and fetch bandwidth, was 24%/10% lower in RISC-V for chibicc/tinycc.
Note that this did not model any instruction fusion in RISC-V and only treated incrementing loads and load pairs as multiple uops (to mirror Apple Silicon).
If the only fusion pair you implement is adjacent compressed sp relative stores, then RISC-V ends up with a lower uop count for both programs. They are trivial to implement because you can just interpret the two adjacent 16-bit instructions as a single 32-bit instruction, and compilers always generate them next to each other and in sorted order in function prolog code. You can do this directly in your RVC expander; it only adds minimal additional delay (zero with a trick), which is constant regardless of decode width.
Raw data:
chibicc/clang-O3-armv9: insns: 419886184 uops: 450136257 bytes: 1679544736
chibicc/clang-O3-armv9-trap: insns: 450205913 uops: 474206409 bytes: 1800823652
chibicc/clang-O3-rva23: insns: 449328186 uops: 449328186 bytes: 1288202666
chibicc/clang-O3-rva23-trap: insns: 474623648 uops: 474623648 bytes: 1375991094
chibicc/clang-O2-armv9: insns: 421810039 uops: 451501004 bytes: 1687240156
chibicc/clang-O2-armv9-trap: insns: 451642152 uops: 475084965 bytes: 1806568608
chibicc/clang-O2-rva23: insns: 449625081 uops: 449625081 bytes: 1286452180
chibicc/clang-O2-rva23-trap: insns: 473682134 uops: 473682134 bytes: 1369720036
chibicc/clang-Os-armv9: insns: 457841653 uops: 489902437 bytes: 1831366612
chibicc/clang-Os-armv9-trap: insns: 497189616 uops: 523323893 bytes: 1988758464
chibicc/clang-Os-rva23: insns: 486216287 uops: 486216287 bytes: 1363135906
chibicc/clang-Os-rva23-trap: insns: 520889604 uops: 520889604 bytes: 1473263784
tinycc/clang-O3-armv9: insns: 115189179 uops: 126358884 bytes: 460756716
tinycc/clang-O3-armv9-trap: insns: 117139555 uops: 128361973 bytes: 468558220
tinycc/clang-O3-rva23: insns: 137035509 uops: 137035509 bytes: 427878586
tinycc/clang-O3-rva23-trap: insns: 139248009 uops: 139248009 bytes: 436548988
tinycc/clang-O2-armv9: insns: 115184314 uops: 126568360 bytes: 460737256
tinycc/clang-O2-armv9-trap: insns: 117651772 uops: 129195276 bytes: 470607088
tinycc/clang-O2-rva23: insns: 137362294 uops: 137362294 bytes: 420468990
tinycc/clang-O2-rva23-trap: insns: 138649335 uops: 138649335 bytes: 428680948
tinycc/clang-Os-armv9: insns: 130661270 uops: 144718253 bytes: 522645080
tinycc/clang-Os-armv9-trap: insns: 132574148 uops: 146565708 bytes: 530296592
tinycc/clang-Os-rva23: insns: 152798316 uops: 152798316 bytes: 452181732
tinycc/clang-Os-rva23-trap: insns: 154232874 uops: 154232874 bytes: 458257882
No comments yet
Contribute on Hacker News ↗