Comment by jancsika

1 year ago

What's the cost of shuttling data in and out of SIMD land?

SIMD doesn’t operate on a separate memory space or anything like that. You just load data from normal memory into the SIMD registers, just like you would have to load it into the scalar registers if you wanted to operate on it with normal instructions.

  • It is slow to move data from SIMD to scalar registers, or can be.

    • It depends, for SIMD float-> scalar floats it is fast as they operate on the same registers. If pulling out of lane 0 you don't even need to do anything(just a type cast). For other lanes you need a shuffle.

      For SIMD integer to scalar integer, it has to move into separate register, so there is some short penalty(3 cycles iir).