← Back to context

Comment by ryao

7 hours ago

That is an approximation. If approximations are acceptable, then here is a trick you might like. In loops that call cosf(i * C) and/or sinf(i * C), where i is incremented by 1 on each iteration and C is some constant expression, you can call cosf() and sinf() once (or twice if i starts at something other than 0 or 1) outside of the loop and use the angle addition formula to do accumulation via multiplication and addition inside the loop. The loop will run significantly faster.

Even if you only need one of cosf() or sinf(), many CPUs calculate both values at the same time, so taking the other is free. If you only need single precision values, you can do this in double precision to avoid much of the errors you would get by doing this in single precision.

This trick can be used to accelerate the RoPE relative positional encoding calculations used in inference for llama 3 and likely others. I have done this and seen a measurable speed up, although these calculations are such a small part of inference that it was a small improvement.