← Back to context

Comment by IshKebab

2 months ago

> if general-purpose HW with horribly-slow misaligned loads/stores came out from them

How is that different for RISC-V?

> I think it's a fairly reasonable expectation that all supported hardware instructions/behaviors (at least non-deprecated ones) are not slower than a software implementation

I agree! So just use misaligned loads if Zicclsm is supported. As you observed there's a feedback loop between what compilers output and what gets optimised in hardware. Since RVA23 hardware is basically non-existent at the moment you kind of have the opportunity to dictate to hardware "LLVM will use misaligned accesses on RVA23; if you make an RVA23 chip where this is horribly slow then people will laugh at you and assume it's some sort of silicon defect".

> How is that different for RISC-V?

RISC-V hardware with slow misaligned mem ops does exist to non-insignificant extent, and it seems not enough people have laughed at them, and instead compilers did just surrender and default to not using them.

> As you observed there's a feedback loop between what compilers output and what gets optimised in hardware.

Well, that loop needs to start somewhere, and it has already started, and started wrong. I suppose we'll see what happens with real RVA23 hardware; at the very least, even if it takes a decade for most hardware to support misaligned well, software could retroactively change its defaults while still remaining technically-RVA23-compatible, so I suppose that's good.

  • > RISC-V hardware with slow misaligned mem ops does exist to non-insignificant extent

    Only U74 and P550, old RV64GC CPUs.

    SiFive's RVA23 cores have fast misaligned accesses, as do all THead and SpacemiT cores.

    I can't imagine that all the Tenstorrent and Ventana and so forth people doing massively OoO 8-wide cores won't also have fast misaligned accesses.

    As a previous poster said: if you're targeting RVA23 then just assume misaligned is fast and if someone one day makes one that isn't then sucks to be them.

    • P550 is, like, what, only a year old? I suppose there has been some laughing at it at least.

      Also Kendryte K230 / C908, but only on vector mem ops, which adds a whole another mess onto this.

      I'd hope all the massive OoO will have fast misaligned mem ops, anything else would immediately cause infinite pain for decades.

      But of course there'll be plenty of RVA23 hardware that's much smaller eventually too, once it becomes a general expectation instead of "cool thing for the very-top-end to have".

      I do agree that it'd be reasonable to just assume fast misaligned ops, but for whatever reason gcc and clang just don't, and that's what we have for defaults.

      5 replies →

>So just use misaligned loads if Zicclsm is supported.

LLVM and GCC developers clearly disagree with you. In other words, re-iterating the previously raised point: Zicclsm is effectively useless and we have to wait decades for hypothetical Oilsm.

Most programmers will not know that the misaligned issue even exists, even less about options like -mno-strict-align. They just will compile their project with default settings and blame RISC-V for being slow.

RISC-V could've easily avoided all this mess by properly mandating misaligned pointer handling as part of the I extension.

  • Well, we don't necessarily have to wait for Oilsm; software that wants to could just choose to be opinionated and run massively-worse on suboptimal hardware. And, of course, once Oilsm hardware becomes the standard, it'd be fine to recompile RVA23-targeting software to it too.

    > RISC-V could've easily avoided all this mess by properly mandating misaligned pointer handling as part of the I extension.

    Rather hard to mandate performance by an open ISA. Especially considering that there could actually be scenarios where it may be necessary to chicken-bit it off; and of course the fact that there's already some questionability on ops crossing pages, where even ARM/x86 are very slow.

    • I am not saying that RISC-V should mandate performance. If anything, we wouldn't had the problem with Zicclsm if they did not bother with the stupid performance note.

      I would be fine with any of the following 3 approaches:

      1) Mandate that store/loads do not support misaligned pointers and introduce separate misaligned instructions (good for correctness, so its my personal preference).

      2) Mandate that store/loads always support misaligned pointers.

      3) Mandate that store/loads do not support misaligned pointers unless Zicclsm/Oilsm/whatever is available.

      If hardware wants to implement a slow handling of misaligned pointers for some reason, it's squarely responsibility of the hardware's vendor. And everyone would know whom to blame for poor performance on some workloads.

      We are effectively going to end up with 3, but many years later and with a lot of additional unnecessary mess associated with it. Arguably, this issue should've been long sorted out in the age of ratification of the I extension.

      3 replies →