Comment by kragen

2 days ago

Yeah, obviously you do want to do the "byzantine" bitops, not just ship code that's vulnerable to timing side channels due to conditional jumps depending on secret data. But you can do that pretty easily on RISC-V, and it doesn't even cost much performance. It's four register-to-register instructions instead of the one you'd have with CMOV:

            .globl minop
    minop:  slt t0, a0, a1          # set t0 to "is a0 < a1?" (0 or 1)
            addi t0, t0, -1         # convert 0 to -1 (a0 ≥ a1) and 1 to 0 (a0 < a1)
            sub t1, a1, a0          # set t1 := a1 - a0
            and t1, t0, t1          # t1 is now either 0 or, if a0 ≥ a1, a1 - a0
            add a0, a0, t1          # transform a0 into a1 if a0 ≥ a1
            ret

Possibly what you meant by "byzantine bitops" is this version:

    minop:  slt t0, a0, a1
            addi t0, t0, -1
            xor t1, a1, a0          # set t1 := a1 ^ a0
            and t1, t0, t1
            xor a0, a0, t1          # transform a0 into a1 using xor
            ret

(http://canonical.org/~kragen/sw/dev3/minop.S http://canonical.org/~kragen/sw/dev3/testminop.c http://canonical.org/~kragen/sw/dev3/minoptests)

I'm interested in knowing if there's a faster way to do this! You could do it in one less instruction with a multiply, but it's pretty common for a multiply to take multiple cycles.

Apparently CMOV isn't such a big win for superscalar architectures, which is what you'd normally use when performance is critical. But I don't know enough about superscalar architectures to really understand that assertion. And, for low-power architectures, people are moving to shorter pipeline lengths, like Cortex-M0 (3 stages) to Cortex-M0+ (2 stages).

In general, the RISC-V standard doesn't make any guarantees about execution time at all. That's out of its scope.

It does actually - there's the Zkt extension which declares that some instructions have data independent execution times. (Basically bitwise ops, add, sub, shift and multiply.)

If there was a cmov instruction it could be included in Zkt.

That said I'm also not sure how much performance benefit it gets you. I think the answer isn't obvious and you'd definitely need to do simulations to see.

There are min/max instructions and zicond: https://github.com/riscvarchive/riscv-zicond/blob/main/zicon...

Most conditional instruxtion sequences are two instructions, cmov is:

    czero.eqz  rd, rs1, rc
    czero.nez  rtmp, rs2, rc
    or         rd, rd, r

  • Aha, thanks! That seems like a much less appealing proposal to me than a simple mv.eqz/mv.nez. I wonder if there are merits to it that aren't obvious to me.

    • The reason against a simple conditional move instruction was that this would require a 3R1W integer register file, while the current extension allows for an 2R1W one.

      I can definitely see a three-source conditional move being added in conjunction with more 3R1W instructions (see what the scalar efficiency SIG is doing). But my understanding is that the 2R1W design is desired for some of the wide, high-performance designs. (register file size grows quadratically with port count)

      1 reply →