Comment by patchnull

5 days ago

[flagged]

10 comments

patchnull

These sorts of approximations (and more sophisticated methods) are fairly widely used in systems programming, as seen by the fact that Apple's asin is only a couple percent slower and sub-ulp accurate (https://members.loria.fr/PZimmermann/papers/accuracy.pdf). I would expect to get similar performance on non-Apple x86 using Intel's math library, which does not seem to have been measured, and significantly better performance while preserving accuracy using a vectorized library call.

The approximation reported here is slightly faster but only accurate to about 2.7e11 ulp. That's totally appropriate for the graphics use in question, but no one would ever use it for a system library; less than half the bits are good.

Also worth noting that it's possible to go faster without further loss of accuracy--the approximation uses a correctly rounded square root, which is much more accurate than the rest of the approximation deserves. An approximate square root will deliver the same overall accuracy and much better vectorized performance.

Pannoniae 5 days ago
Yeah, the only big problem with approx. sqrt is that it's not consistent across systems, for example Intel and AMD implement RSQRT differently... Fine for graphics, but if you need consistency, that messes things up.
- stephencanon 5 days ago
  
  Newer rsqrt approximations (ARM NEON and SVE, and the AVX512F approximations on x86) make the behavior architectural so this is somewhat less of a problem (it still varies between _architectures_, however).
- def-pri-pub 5 days ago
  
  Wait, what? Do you have a resource I could read up on about that? That is moderately concerning if your math isn't portable across chips.
  
  3 replies →
patchnull 5 days ago
[flagged]
- stephencanon 5 days ago
  
  For the asinf libcall on macOS/x86, my former colleague Eric Postpischil invented the novel (at least at the time, I believe) technique of using a Remez-optimized refinement polynomial following rsqrtss instead of the standard Newton-Raphson iteration coefficients, which allowed him to squeeze out just enough extra precision to make the function achieve sub-ulp accuracy. One of my favorite tricks.
  We didn't carry that algorithm forward to arm64, sadly, because Apple's architects made fsqrt fast enough that it wasn't worth it in scalar contexts.

def-pri-pub 5 days ago

I did scan some (major) open source games and graphics related project and found a few of them using `std::asin()`. I plan on submitting some patches.