← Back to context

Comment by dapperdrake

1 year ago

-ffast-math and -Ofast are inadvisable on principle:

Tl;dr: python gevent messes up your x87 float registers (yes.)

https://moyix.blogspot.com/2022/09/someones-been-messing-wit...

I disagree with "on principle." There are flaws in the design of IEEE 754 and omitting strict adherence for the purposes of performance is fine, if not required for some applications.

For example, recursive filters (even the humble averaging filter) will suffer untold pain without enabling DAZ/FTZ mode.

fwiw the linked issue has been remedied in recent compilers and isn't a python problem, it's a gcc problem. Even that said, if your algorithm requires subnormal numbers, for the love of numeric stability, guard your scopes and set the mxcsr register accordingly!

  • A big problem with -ffast-math is that it causes isnan and isinf to be completely, silently broken (gcc and clang).

    Like, "oh you want faster FP operations? well then surely you have no need to ever be able to detect infinite or NaN values.."

  • In practice, "some applications" seems to include almost all of NumPy and Python. Good call.

    Like with the Java sin() fixes: if you don't care about the results being correct why not constant-fold an arbitrary number? Way faster at run-time.

    • All numerical methods define "correct" to be within a range or to some precision. There are very few algorithms that require FTZ mode to be "correct" - the linked article and the article it links don't even have an example (there are good examples of where say, -ffinite-math is super dangerous, because inf/NaNs are way more common than arithmetic on subnormal numbers).

      And yea, the fact that crt1.o being linked into shared libraries fucking up the precision of some computations depending on library dependencies (and the order they're loaded!) was bad.. but it lingered in the entire Linux ecosystem for over a decade. So how bad was it, if it took that long to notice?

      If you have a numerical algorithm that requires subnormal arithmetic to converge, a) don't that's super shaky, b) set/unset mxcsr at the top/bottom of your function and ensure you never unwind the stack without resetting it. It's preserved across context switches so you're not going to get blown away by the OS scheduler.

      This isn't practical numerical methods in C 101 but it's at least 201. In practice you don't trust floats for bit exact math. Use different types for that.

      1 reply →

  • I find that building and testing my code with -Ofast and -ffast-math from the beginning helps to avoid a lot of the issues with them. Any new code that breaks with them on probably wasn't particularly stable anyway and should be rethought.

"what kind of math does the compile usually do without this funsafemath flag? Sad dangerous math?"

  • There are things like floating point exceptions (IEEE 754) and subnormal numbers (close to zero, have less precision than the small approximation error "machine-epsilon"). The idea is to degrade gracefully. These additional features require additional transistors and processing which raises latency.

    If you really know (and want to know) what you are doing, turning this stuff off may help. Some people even advocate brute-forcing all 2^32 single floats in your test cases, because it is kind if feasible to do so: https://news.ycombinator.com/item?id=34726919