Comment by nullc

8 months ago

Has RISC-V gained a cmov yet or is security critical code still left do branch-and-pray or use byzantine bitops?

9 comments

nullc

I'm sure at least one proposed extension has a constant time conditional move, but I don't know of a ratified extension has one. But as T-Head demonstrated with the V extension, shipped silicon can implement non-ratified extensions, and as the fast interrupt handling in WCH's CH32V003 demonstrates, shipped silicon can extend the architecture in ways that haven't even been proposed as extensions. But I don't know of any shipped silicon with a constant time conditional move, either.

For most people, though, using "byzantine" bitops in their security-critical code is less important than being able to run it on a processor that doesn't implement IME or other presumable US backdoors. (Huawei backdoors, though?)

nullc 8 months ago
Ending up with visible timing side channels where there otherwise wouldn't be ones though is not an awesome tradeoff. Like... vulnerable to NSA vs vulnerable to everyone? The former is probably preferable. And as you say-- the RISC-V option may not be backdoor free, it may just be an alternative backdoor.
Seems like such an unforced error too, particular because CMOVs are extremely beneficial for performance in out of order deeply pipelined architectures. Though at least the performance side can be answered with extended behavior the security side needs guarantees (and ideally ones that aren't "this instruction sequence is constant time on some chips and variable time on others").
- kragen 8 months ago
  
  Yeah, obviously you do want to do the "byzantine" bitops, not just ship code that's vulnerable to timing side channels due to conditional jumps depending on secret data. But you can do that pretty easily on RISC-V, and it doesn't even cost much performance. It's four register-to-register instructions instead of the one you'd have with CMOV:
  .globl minop minop: slt t0, a0, a1 # set t0 to "is a0 < a1?" (0 or 1) addi t0, t0, -1 # convert 0 to -1 (a0 ≥ a1) and 1 to 0 (a0 < a1) sub t1, a1, a0 # set t1 := a1 - a0 and t1, t0, t1 # t1 is now either 0 or, if a0 ≥ a1, a1 - a0 add a0, a0, t1 # transform a0 into a1 if a0 ≥ a1 ret
  Possibly what you meant by "byzantine bitops" is this version:
  minop: slt t0, a0, a1 addi t0, t0, -1 xor t1, a1, a0 # set t1 := a1 ^ a0 and t1, t0, t1 xor a0, a0, t1 # transform a0 into a1 using xor ret
  (http://canonical.org/~kragen/sw/dev3/minop.S http://canonical.org/~kragen/sw/dev3/testminop.c http://canonical.org/~kragen/sw/dev3/minoptests)
  I'm interested in knowing if there's a faster way to do this! You could do it in one less instruction with a multiply, but it's pretty common for a multiply to take multiple cycles.
  Apparently CMOV isn't such a big win for superscalar architectures, which is what you'd normally use when performance is critical. But I don't know enough about superscalar architectures to really understand that assertion. And, for low-power architectures, people are moving to shorter pipeline lengths, like Cortex-M0 (3 stages) to Cortex-M0+ (2 stages).
  In general, the RISC-V standard doesn't make any guarantees about execution time at all. That's out of its scope.
  
  6 replies →